The anisotropic noise in stochastic gradient descent: Its behavior of escaping from sharp minima and regularization effects Z Zhu, J Wu, B Yu, L Wu, J Ma International Conference on Machine Learning (ICML 2019), 2018 | 225 | 2018 |
How SGD selects the global minima in over-parameterized learning: A dynamical stability perspective L Wu, C Ma, E Weinan Advances in Neural Information Processing Systems (NeurIPS 2018), 2018 | 224 | 2018 |
Towards understanding generalization of deep learning: perspective of loss landscapes L Wu, Z Zhu, W E ICML 2017 Workshop on Principled Approaches to Deep Learning, 2017, 2017 | 214 | 2017 |
The Barron space and the flow-induced function spaces for neural network models W E, C Ma, L Wu Constructive Approximation 55 (1), 369-406, 2022 | 193* | 2022 |
Towards understanding and improving the transferability of adversarial examples in deep neural networks L Wu, Z Zhu Asian Conference on Machine Learning, 837-850, 2020 | 160* | 2020 |
A priori estimates of the population risk for two-layer neural networks W E, C Ma, L Wu Communications in Mathematical Sciences 17 (5), 1407-1425, 2019 | 118* | 2019 |
Towards a mathematical understanding of neural network-based machine learning: what we know and what we don't W E, C Ma, S Wojtowytsch, L Wu CSIAM Transactions on Applied Mathematics 1 (4), 561--615, 2020 | 117* | 2020 |
A comparative analysis of optimization and generalization properties of two-layer neural network and random feature models under gradient descent dynamics E Weinan, C Ma, L Wu Science China Mathematics 63 (7), 1235-1258, 2020 | 98 | 2020 |
Machine learning from a continuous viewpoint, I C Ma, L Wu Science China Mathematics 63 (11), 2233-2266, 2020 | 69 | 2020 |
Beyond the quadratic approximation: the multiscale structure of neural network loss landscapes C Ma, D Kunin, L Wu, L Ying Journal of Machine Learning, 2022, 2022 | 53* | 2022 |
The anisotropic noise in stochastic gradient descent: Its behavior of escaping from minima and regularization effects Z Zhu, J Wu, B Yu, L Wu, J Ma stat 1050, 21, 2018 | 44 | 2018 |
Irreversible samplers from jump and continuous Markov processes YA Ma, EB Fox, T Chen, L Wu Statistics and Computing, 1-26, 2018 | 41* | 2018 |
The alignment property of SGD noise and how it helps select flat minima: A stability analysis L Wu, M Wang, WJ Su NeurIPS 2022, 2022 | 33* | 2022 |
Global Convergence of Gradient Descent for Deep Linear Residual Networks L Wu, Q Wang, C Ma Advances in Neural Information Processing Systems (NeurIPS 2019), 2019 | 28 | 2019 |
Machine learning based non-Newtonian fluid model with molecular fidelity H Lei, L Wu, W E Physical Review E 102 (4), 043309, 2020 | 20 | 2020 |
Complexity measures for neural networks with general activation functions using path-based norms Z Li, C Ma, L Wu arXiv preprint arXiv:2009.06132, 2020 | 19 | 2020 |
A qualitative study of the dynamic behavior for adaptive gradient algorithms C Ma, L Wu, E Weinan Mathematical and Scientific Machine Learning, 671-692, 2022 | 17 | 2022 |
Learning a single neuron for non-monotonic activation functions L Wu AISTATS 2022, 2022 | 16 | 2022 |
Approximation analysis of convolutional neural networks C Bao, Q Li, Z Shen, C Tai, L Wu, X Xiang work 65, 871, 2014 | 16 | 2014 |
The implicit regularization of dynamical stability in stochastic gradient descent L Wu, WJ Su ICML 2023, 2023 | 15 | 2023 |