Song Mei, Assistant Professor
Departments of Statistics and Electrical Engineering and Computer Sciences, UC Berkeley
Broad 2100A
Abstract:
Neural sequence models based on the transformer architecture have demonstrated remarkable in-context learning (ICL) abilities, where they can perform new tasks when prompted with training and test examples, without any parameter update to the model. In this talk, we theoretically investigate the strong ICL abilities of transformers. We first provide a statistical theory for transformers to perform ICL by deriving end-to-end quantitative results for the expressive power, in-context prediction power, and sample complexity of pre-training. Concretely, we show that transformers can implement a broad class of standard machine learning algorithms in context, such as least squares, ridge regression, Lasso, convex risk minimization for generalized linear models (such as logistic regression), and gradient descent on two-layer neural networks, with near-optimal predictive power on various in-context data distributions. Building on these “base” ICL algorithms, intriguingly, we show that transformers can implement more complex ICL procedures involving in-context algorithm selection, akin to what a statistician can do in real life—A single transformer can adaptively select different base ICL algorithms—or even perform qualitatively different tasks—on different input sequences, without any explicit prompting of the right algorithm or task.
Bio:
Song Mei is an assistant professor of statistics at UC Berkeley. He received his Ph. D. from Stanford in June 2020. Song’s research lies at the intersection of statistics and machine learning. His recent research interests include high dimensional statistical inference, theory of deep learning, and theory of reinforcement learning.