The anisotropic noise in stochastic gradient descent: Its behavior of escaping from sharp minima and regularization effects Z Zhu, J Wu, B Yu, L Wu, J Ma International Conference on Machine Learning (ICML 2019), 2018 | 289* | 2018 |
How SGD selects the global minima in over-parameterized learning: A dynamical stability perspective L Wu, C Ma, E Weinan Advances in Neural Information Processing Systems (NeurIPS 2018), 2018 | 245 | 2018 |
Towards understanding generalization of deep learning: perspective of loss landscapes L Wu, Z Zhu, W E ICML 2017 Workshop on Principled Approaches to Deep Learning, 2017, 2017 | 226 | 2017 |
The Barron space and the flow-induced function spaces for neural network models W E, C Ma, L Wu Constructive Approximation 55 (1), 369-406, 2022 | 219* | 2022 |
Towards understanding and improving the transferability of adversarial examples in deep neural networks L Wu, Z Zhu Asian Conference on Machine Learning, 837-850, 2020 | 172* | 2020 |
Towards a mathematical understanding of neural network-based machine learning: what we know and what we don't W E, C Ma, S Wojtowytsch, L Wu CSIAM Transactions on Applied Mathematics 1 (4), 561--615, 2020 | 130* | 2020 |
A priori estimates of the population risk for two-layer neural networks W E, C Ma, L Wu Communications in Mathematical Sciences 17 (5), 1407-1425, 2019 | 130* | 2019 |
A comparative analysis of optimization and generalization properties of two-layer neural network and random feature models under gradient descent dynamics E Weinan, C Ma, L Wu Science China Mathematics 63 (7), 1235-1258, 2020 | 109 | 2020 |
Machine learning from a continuous viewpoint, I C Ma, L Wu Science China Mathematics 63 (11), 2233-2266, 2020 | 71 | 2020 |
Beyond the quadratic approximation: the multiscale structure of neural network loss landscapes C Ma, D Kunin, L Wu, L Ying Journal of Machine Learning, 2022, 2022 | 58* | 2022 |
The alignment property of SGD noise and how it helps select flat minima: A stability analysis L Wu, M Wang, WJ Su Advances in Neural Information Processing Systems (NeurIPS 2022), 2022 | 43* | 2022 |
Irreversible samplers from jump and continuous Markov processes YA Ma, EB Fox, T Chen, L Wu Statistics and Computing, 1-26, 2018 | 41* | 2018 |
Global Convergence of Gradient Descent for Deep Linear Residual Networks L Wu, Q Wang, C Ma Advances in Neural Information Processing Systems (NeurIPS 2019), 2019 | 31 | 2019 |
A qualitative study of the dynamic behavior for adaptive gradient algorithms C Ma, L Wu, E Weinan Mathematical and Scientific Machine Learning, 671-692, 2022 | 23 | 2022 |
Machine learning based non-Newtonian fluid model with molecular fidelity H Lei, L Wu, W E Physical Review E 102 (4), 043309, 2020 | 23 | 2020 |
Complexity measures for neural networks with general activation functions using path-based norms Z Li, C Ma, L Wu arXiv preprint arXiv:2009.06132, 2020 | 22 | 2020 |
Learning a single neuron for non-monotonic activation functions L Wu International Conference on Artificial Intelligence and Statistics (AISTATS …, 2022 | 19 | 2022 |
The implicit regularization of dynamical stability in stochastic gradient descent L Wu, WJ Su International Conference on Machine Learning (ICML), 2023, 2023 | 18 | 2023 |
Theoretical analysis of inductive biases in deep convolutional networks Z Wang, L Wu Advances in Neural Information Processing Systems (NeurIPS), 2023, 2023 | 18 | 2023 |
Approximation analysis of convolutional neural networks C Bao, Q Li, Z Shen, C Tai, L Wu, X Xiang East Asian Journal on Applied Mathematics 13 (3), 524-549, 2014 | 18 | 2014 |