Quanquan Gu, Assistant Professor
UCLA Department of Computer Science
Deep learning has achieved tremendous successes in many applications. However, why deep learning is so powerful is still less well understood. One of the mysteries is that deep neural networks used in practice are often heavily over-parameterized such that they can even fit random labels to the input data, while they can still achieve very small test error when trained with real labels. In order to understand this phenomenon, in this talk, I will first show that with over-parameterization and a proper random initialization, gradient-based methods can find the global minima of the training loss for DNNs with the ReLU activation function. Then I will show under certain assumption on the data distribution, gradient descent with a proper random initialization is able to train a sufficiently over-parameterized DNN to achieve arbitrarily small test error. This leads to an algorithmic-dependent generalization error bound for deep learning. I will conclude by discussing implications, challenges and future work along this line of research.
Quanquan Gu is an Assistant Professor of Computer Science at UCLA. His current research is in the area of artificial intelligence and machine learning, with a focus on developing and analyzing nonconvex optimization algorithms for machine learning to understand large-scale, dynamic, complex and heterogeneous data, and building the theoretical foundations of deep learning. He received his Ph.D. degree in Computer Science from the University of Illinois at Urbana-Champaign in 2014. He is a recipient of the Yahoo! Academic Career Enhancement Award in 2015, NSF CAREER Award in 2017, Adobe Data Science Research Award and Salesforce Deep Learning Research Award in 2018, and Simons Berkeley Research Fellowship in 2019.