Upcoming Weekly Seminar Series

How to Subscribe to the UCLA Statistics Seminars Mailing List

Join the UCLA Statistics seminars mailing list by sending an email to sympa@sympa.it.ucla.edu with “subscribe stat_seminars” (without quotation marks) in the subject field and the message body blank. This needs to be done from the address that is to be subscribed. After doing that please respond to the email that you receive. An automated email will be sent which confirms that you have been added.

How to Unsubscribe from the UCLA Statistics Seminars Mailing List

You may be receiving our seminar emails because you are directly subscribed to our seminars mailing list (or you may be one of our graduate students, undergraduate students, faculty, etc. and are subscribed to a different mailing list that also receives the seminar emails). If you are the former then you may unsubscribe from the seminar mailing list by sending an email to sympa@sympa.it.ucla.edu with “unsubscribe stat_seminars” (without quotation marks) in the subject field and the message body blank. This needs to be done from the address that is subscribed. After sending that email please follow the directions in the email response that you receive.

Viewing our Seminars Remotely

When viewing one of our live seminars remotely, it is optimal to have your Zoom settings such that you are using “Side-by-side: Speaker View”. You can see details of how to do this here.

 

Thursday, 05/30/2024, Time: 11:00am-12:15pm, On the Implicit Optimization Bias of Next-Token Prediction

Speaker: Christos Thrampoulidis, Assistant Professor
Department of Electrical and Computer Engineering, University of British Columbia

Abstract:

The talk explores optimization principles of next-token prediction (NTP), which has become the go-to paradigm for training modern language models. We frame NTP as cross-entropy optimization across distinct contexts, each tied to a sparse conditional probability distribution across a finite vocabulary. This leads us to introduce “NTP-separability conditions,” which enable reaching the entropy lower bound of the NTP objective. We then focus on NTP-trained linear models for which we fully specify the optimization bias of gradient descent. Our analysis highlights the key role played by the sparsity pattern of the contexts’ conditional distributions and introduces a NTP-specific notion of margin. We also investigate a log-bilinear NTP model, which abstracts sufficiently expressive language models: In large embedding spaces, we can characterize the geometry of word and context embeddings in relation to a NTP-margin-maximizing logit matrix, which separates out-of-support words. Through experiments we show how this optimization perspective establishes new links between geometric features of the embeddings and textural structures.

Biography:

Dr. Thrampoulidis is an Assistant Professor in the Department of Electrical and Computer Engineering at the University of British Columbia. Previously, he was an Assistant Professor at the University of California, Santa Barbara and a Postdoctoral Researcher at MIT. He received a M.Sc. and a Ph.D. in Electrical Engineering in 2012 and 2016, respectively, both from Caltech, with a minor in Applied and Computational Mathematics. In 2011, he received a Diploma in ECE from the University of Patras, Greece. His research is on machine learning, high-dimensional statistics and optimization.