The anisotropic noise in stochastic gradient descent: Its behavior of escaping from sharp minima and regularization effects Z Zhu, J Wu, B Yu, L Wu, J Ma International Conference on Machine Learning (ICML 2019), 2018 | 217 | 2018 |

How SGD selects the global minima in over-parameterized learning: A dynamical stability perspective L Wu, C Ma, E Weinan Advances in Neural Information Processing Systems (NeurIPS 2018), 2018 | 216 | 2018 |

Towards understanding generalization of deep learning: perspective of loss landscapes L Wu, Z Zhu, W E ICML 2017 Workshop on Principled Approaches to Deep Learning, 2017, 2017 | 206 | 2017 |

The Barron space and the flow-induced function spaces for neural network models W E, C Ma, L Wu Constructive Approximation 55 (1), 369-406, 2022 | 184* | 2022 |

Towards understanding and improving the transferability of adversarial examples in deep neural networks L Wu, Z Zhu Asian Conference on Machine Learning, 837-850, 2020 | 156* | 2020 |

A priori estimates of the population risk for two-layer neural networks W E, C Ma, L Wu Communications in Mathematical Sciences 17 (5), 1407-1425, 2019 | 113* | 2019 |

Towards a mathematical understanding of neural network-based machine learning: what we know and what we don't W E, C Ma, S Wojtowytsch, L Wu CSIAM Transactions on Applied Mathematics 1 (4), 561--615, 2020 | 112* | 2020 |

A comparative analysis of optimization and generalization properties of two-layer neural network and random feature models under gradient descent dynamics E Weinan, C Ma, L Wu Sci. China Math, 2019 | 94 | 2019 |

Machine learning from a continuous viewpoint, I C Ma, L Wu Science China Mathematics 63 (11), 2233-2266, 2020 | 67 | 2020 |

Beyond the quadratic approximation: the multiscale structure of neural network loss landscapes C Ma, D Kunin, L Wu, L Ying Journal of Machine Learning, 2022, 2022 | 49* | 2022 |

The anisotropic noise in stochastic gradient descent: Its behavior of escaping from minima and regularization effects Z Zhu, J Wu, B Yu, L Wu, J Ma | 44 | 2018 |

Irreversible samplers from jump and continuous Markov processes YA Ma, EB Fox, T Chen, L Wu Statistics and Computing, 1-26, 2018 | 40* | 2018 |

The alignment property of SGD noise and how it helps select flat minima: A stability analysis L Wu, M Wang, WJ Su NeurIPS 2022, 2022 | 35* | 2022 |

Global Convergence of Gradient Descent for Deep Linear Residual Networks L Wu, Q Wang, C Ma Advances in Neural Information Processing Systems (NeurIPS 2019), 2019 | 27 | 2019 |

Machine learning based non-Newtonian fluid model with molecular fidelity H Lei, L Wu, W E Physical Review E 102 (4), 043309, 2020 | 20 | 2020 |

Complexity measures for neural networks with general activation functions using path-based norms Z Li, C Ma, L Wu arXiv preprint arXiv:2009.06132, 2020 | 17 | 2020 |

Learning a single neuron for non-monotonic activation functions L Wu AISTATS 2022, 2022 | 16 | 2022 |

Approximation analysis of convolutional neural networks C Bao, Q Li, Z Shen, C Tai, L Wu, X Xiang work 65, 871, 2014 | 15 | 2014 |

A qualitative study of the dynamic behavior for adaptive gradient algorithms C Ma, L Wu, E Weinan Mathematical and Scientific Machine Learning, 671-692, 2022 | 14 | 2022 |

The Slow Deterioration of the Generalization Error of the Random Feature Model C Ma, L Wu, W E Mathematical and Scientific Machine Learning (MSML) 2020, 2020 | 14 | 2020 |