Xiaoxia (Shirley) Wu 吴晓霞
Xiaoxia (Shirley) Wu 吴晓霞
Other namesXiaoxia Wu
Verified email at - Homepage
Cited by
Cited by
Adagrad stepsizes: Sharp convergence over nonconvex landscapes
R Ward, X Wu, L Bottou
The Journal of Machine Learning Research 21 (1), 9047-9076, 2020
When Do Curricula Work?
X Wu, E Dyer, B Neyshabur
International Conference on Learning Representations, 2021
Wngrad: Learn the learning rate in gradient descent
X Wu, R Ward, L Bottou
arXiv preprint arXiv:1803.02865, 2018
Global convergence of adaptive gradient methods for an over-parameterized neural network
X Wu, SS Du, R Ward
arXiv preprint arXiv:1902.07111, 2019
Linear convergence of adaptive stochastic gradient descent
Y Xie, X Wu, R Ward
International conference on artificial intelligence and statistics, 1475-1485, 2020
Hierarchical learning for generation with long source sequences
T Rohde, X Wu, Y Liu
arXiv preprint arXiv:2104.07545, 2021
Choosing the Sample with Lowest Loss makes SGD Robust
V Shah, X Wu, S Sanghavi
International Conference on Artificial Intelligence and Statistics 108, 2120 …, 2020
Value-at-Risk estimation with stochastic interest rate models for option-bond portfolios
X Wang, D Xie, J Jiang, X Wu, J He
Finance Research Letters 21, 10-20, 2017
Implicit Regularization and Convergence for Weight Normalization
X Wu, E Dobriban, T Ren, S Wu, Z Li, S Gunasekar, R Ward, Q Liu
Advances in Neural Information Processing Systems 33, 2020
ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers
Z Yao, RY Aminabadi, M Zhang, X Wu, C Li, Y He
Advances in Neural Information Processing Systems, 2022
LEAP: Learnable Pruning for Transformer-based Models
Z Yao, X Wu, L Ma, S Shen, K Keutzer, MW Mahoney, Y He
arXiv preprint arXiv:2105.14636, 2021
Adaptive differentially private empirical risk minimization
X Wu, L Wang, I Cristali, Q Gu, R Willett
arXiv preprint arXiv:2110.07435, 2021
An optimal mortgage refinancing strategy with stochastic interest rate
X Wu, D Xie, DA Edwards
Computational Economics 53, 1353-1375, 2019
XTC: Extreme Compression for Pre-trained Transformers Made Simple and Efficient
X Wu, Z Yao, M Zhang, C Li, Y He
Advances in Neural Information Processing Systems, 2022
Adaloss: A computationally-efficient and provably convergent adaptive gradient method
X Wu, Y Xie, SS Du, R Ward
Proceedings of the AAAI Conference on Artificial Intelligence 36 (8), 8691-8699, 2022
Random-LTD: Random and Layerwise Token Dropping Brings Efficient Training for Large-scale Transformers
Z Yao, X Wu, C Li, C Holmes, M Zhang, C Li, Y He
arXiv preprint arXiv:2211.11586, 2022
A Comprehensive Study on Post-Training Quantization for Large Language Models
Z Yao, C Li, X Wu, S Youn, Y He
arXiv preprint arXiv:2303.08302, 2023
Understanding INT4 Quantization for Transformer Models: Latency Speedup, Composability, and Failure Cases
X Wu, C Li, RY Aminabadi, Z Yao, Y He
arXiv preprint arXiv:2301.12017, 2023
DeepSpeed Data Efficiency: Improving Deep Learning Model Quality and Training Efficiency via Efficient Data Sampling and Routing
C Li, Z Yao, X Wu, M Zhang, Y He
arXiv preprint arXiv:2212.03597, 2022
Optimal exercise frontier of Bermudan options by simulation methods
D Xie, DA Edwards, X Wu
International Journal of Financial Engineering 9 (03), 2250013, 2022
The system can't perform the operation now. Try again later.
Articles 1–20