Thursday, 05/30/2024, Time: 11:00am-12:15pm, On the Implicit Optimization Bias of Next-Token Prediction
Location: Public Affairs Building 1222
Speaker: Christos Thrampoulidis, Assistant Professor
Department of Electrical and Computer Engineering, University of British Columbia
Abstract:
The talk explores optimization principles of next-token prediction (NTP), which has become the go-to paradigm for training modern language models. We frame NTP as cross-entropy optimization across distinct contexts, each tied to a sparse conditional probability distribution across a finite vocabulary. This leads us to introduce “NTP-separability conditions,” which enable reaching the entropy lower bound of the NTP objective. We then focus on NTP-trained linear models for which we fully specify the optimization bias of gradient descent. Our analysis highlights the key role played by the sparsity pattern of the contexts’ conditional distributions and introduces a NTP-specific notion of margin. We also investigate a log-bilinear NTP model, which abstracts sufficiently expressive language models: In large embedding spaces, we can characterize the geometry of word and context embeddings in relation to a NTP-margin-maximizing logit matrix, which separates out-of-support words. Through experiments we show how this optimization perspective establishes new links between geometric features of the embeddings and textural structures.
Biography:
Dr. Thrampoulidis is an Assistant Professor in the Department of Electrical and Computer Engineering at the University of British Columbia. Previously, he was an Assistant Professor at the University of California, Santa Barbara and a Postdoctoral Researcher at MIT. He received a M.Sc. and a Ph.D. in Electrical Engineering in 2012 and 2016, respectively, both from Caltech, with a minor in Applied and Computational Mathematics. In 2011, he received a Diploma in ECE from the University of Patras, Greece. His research is on machine learning, high-dimensional statistics and optimization.
Thursday, 05/23/2024, Time: 11:00am-12:15pm, Learning linear causal models via algebraic constraints
Location: Public Affairs Building 1222
Elina Robeva, Assistant Professor
Department of Mathematics, University of British Columbia
Abstract:
One of the main tasks of causal inference is to learn direct causal relationships among observed random variables. These relationships are usually depicted via a directed graph whose vertices are the variables of interest and whose edges represent direct causal effects. In this talk we will discuss the problem of learning such a directed graph for a linear causal model. We will specifically address the case where the graph may have hidden variables or directed cycles. In general, the causal graph cannot be learned uniquely from observational data. However, in the special case of linear non-Gaussian acyclic causal models, the directed graph can be found uniquely. When cycles are allowed the graph can be learned up to an equivalence class. We characterize the equivalence classes of such cyclic graphs and we propose algorithms for causal discovery. Our methods are based on using specific matrix rank constraints which hold among the second and higher order moments of the random vector and which can help identify the graph.
Biography:
Elina Robeva is an Assistant Professor in Mathematics at the University of British Columbia. Her work lies at the intersection of mathematical statistics, machine learning, applied and multilinear algebra. Elina obtained her PhD in Mathematics in 2016 at the University of California, Berkeley under the supervision of Bernd Sturmfels. From 2016 to 2019 she was a Statistics Instructor and an NSF Postdoctoral Fellow at MIT. Since then she has received the SIAM Algebraic Geometry Early Career Prize, UBC/PIMS Mathematical Sciences Young Faculty Award, CAIMS/PIMS Early Career Research Award, and the André-Aisenstadt Prize. Since 2024, she is also a Canada CIFAR AI Chair and a member of the Alberta Machine Intelligence Institute.
Thursday, 05/16/2024, Time: 11:00am-12:15pm, Learning linear models in-context with transformers
Location: Public Affairs Building 1222
Spencer Frei, Assistant Professor
Department of Statistics, UC Davis
Abstract:
Attention-based neural network sequence models such as transformers have the capacity to act as supervised learning algorithms: They can take as input a sequence of labeled examples and output predictions for unlabeled test examples. Indeed, recent work by Garg et al. has shown that when training GPT2 architectures over random instances of linear regression problems, these models’ predictions mimic those of ordinary least squares. Towards understanding the mechanisms underlying this phenomenon, we investigate the dynamics of in-context learning of linear predictors for a transformer with a single linear self-attention layer trained by gradient flow. We show that despite the non-convexity of the underlying optimization problem, gradient flow with a random initialization finds a global minimum of the objective function. Moreover, when given a prompt of labeled examples from a new linear prediction task, the trained transformer achieves small prediction error on unlabeled test examples. We further characterize the behavior of the trained transformer under distribution shifts. Talk based on joint work with Ruiqi Zhang and Peter Bartlett.
Biography:
Spencer Frei is an Assistant Professor of Statistics at UC Davis. His research is on the foundations of deep learning, including topics related to benign overfitting, implicit regularization, and large language models. Prior to joining UC Davis he was a postdoctoral fellow at the Simons Institute for the Theory of Computing at UC Berkeley, hosted by Peter Bartlett and Bin Yu. He received his Ph.D in Statistics at UCLA, supervised by Quanquan Gu and Ying Nian Wu. He was a co-organizer of a tutorial at NeurIPS 2023 on benign overfitting and of the 2022 Deep Learning Theory Workshop and Summer School at the Simons Institute, and was named a Rising Star in Machine Learning by the University of Maryland.
Thursday, 05/09/2024, Time: 11:00am-12:15pm, Neural Approaches for Geometric (optimization) Problems
Location: online only due to campus security status
Yusu Wang, Professor
Halιcιoğlu Data Science Institute and Computer Science & Engineering Department at UCSD
Abstract:
Machine learning, especially the use of neural networks have shown great success in a broad range of applications. Recently, neural approaches have also shown promise in tackling (combinatorial) optimization problems in a data-driven manner. On the other hand, for many problems, especially geometric optimization problems, many beautiful geometric ideas and algorithmic insights have been developed in fields such as theoretical computer science and computational geometry. Our goal is to infuse geometric and algorithmic ideas to the design of neural frameworks so that they can be more effective and generalize better. In this talk, I will give two examples in this direction. The first one is what we call a mixed Neural-algorithmic framework for the Steiner Tree problem in the Euclidean space, leveraging the celebrated PTAS algorithm by Arora. Interestingly, here the model complexity can be made independent of the input point set size. The second one is a neural architecture for approximating the Wasserstein distance between point sets, whose design /analysis uses a geometric coreset idea.
Bio:
Yusu Wang is currently Professor in the Halicioglu Data Science Institute at University of California, San Diego, where she also serves as the Director for the NSF National AI Institute TILOS. Prior to joining UCSD, she was Professor in the Computer Science and Engineering Department at the Ohio State University. She obtained her PhD degree from Duke University in 2004 where she received the Best PhD Dissertation Award in the CS Department. From 2004-2005, she was a post-doctoral fellow at Stanford University. Yusu Wang primarily works in the fields of geometric and topological data analysis (with a textbook on Computational Topology for Data Anaysis), especially the use of these ideas in modern machine learning. She received DOE Early Career Principal Investigator Award in 2006, and NSF Career Award in 2008. She is on the editorial boards for SIAM Journal on Computing (SICOMP) and Journal of Computational Geometry (JoCG). She is a member of the Computational Geometry Steering Committee, as well as a member of the AATRN Advisory Committee. She also serves in the SIGACT CATCS committee and AWM Meetings Committee.
Thursday, 05/02/2024, Time: 11:00am-12:15pm, A Framework for Trustworthiness Evaluation of AI Systems
Location: Public Affairs Building 1222 (it is recommended that participants attend the seminar remotely)
Arya Farahi, Assistant Professor
Department of Statistics and Data Science, University of Texas at Austin
Abstract:
As AI continues to permeate various facets of our society and contribute to scientific advancements, it becomes a necessity to go beyond traditional metrics such as predictive accuracy and error rates and assess their trustworthiness. This talk introduces two novel frameworks, U-trustworthiness and I-trustworthiness, for evaluating AI system trustworthiness. First, I formalize U-trustworthiness, grounded in the competence-based theory of trust, for trustworthiness evaluation of AI systems focused on maximizing utility. In the second part, I introduce I-trustworthiness that links conditional calibration to trustworthiness for inference tasks and then propose the Kernel Local Calibration Error (KLCE) for assessing trustworthiness of probabilistic classifiers. Both frameworks challenge traditional evaluation metrics by providing theoretical and experimental validations, offering new methods for improving the reliability and responsibility of AI implementations.
Bio:
Arya Farahi is an Assistant Professor in the Department of Statistics and Data Science at the University of Texas at Austin, where he leads the D3 Lab. His research primarily explores the under-explored aspects of AI models, such as explainability, trustworthiness, bias, and uncertainty quantification; and develops methodologies to address these issues. His current work emphasizes the development of trust frameworks which seek to enhance the reliability and transparency of AI systems, aligning with his commitment to responsible use of AI. Before his current role, Arya was a Data Science Fellow at the Michigan Institute for Data Science and a McWilliams Fellow at Carnegie Mellon University. He holds a Ph.D. in Physics and Scientific Computing from the University of Michigan.
Thursday, 04/25/2024, Time: 11:00am-12:15pm, Adaptive MOSUM: inference for multiscale breaks in high dimensional time series
Location: Public Affairs Building 1222
Likai Chen, Assistant Professor
Department of Mathematics and Statistics, Washington University in St. Louis
Abstract:
Moving sum (MOSUM) test statistic is popular for multiple change-point detection due to its simplicity of implementation and effective control of the significance level for multiple testing. However, its performance heavily relies on the selection of the bandwidth parameter for the window size, which is extremely difficult to determine in advance. To address this issue, we propose an adaptive MOSUM method, applicable in both multiple and high-dimensional time series models. Specifically, we adopt an l2-norm to aggregate MOSUM statistics cross-sectionally, and take the maximum over time and bandwidth candidates. We provide the asymptotic distribution of the test statistics, accommodating general weak temporal and cross-sectional dependence. By employing a screening procedure, we can consistently estimate the number of change points, and the convergence rates for the estimated timestamps and sizes of the breaks are presented. The asymptotic properties and the estimation precision are demonstrated by extensive simulation studies. Furthermore, we present an application using real-world COVID-19 data from Brazil, wherein we observe distinct outbreak stages among subjects of different age groups and geographic locations. These findings may facilitate analysis of epidemics, pandemics, and data from various fields of knowledge exhibiting similar patterns.
Bio:
Likai Chen, an assistant professor for Washington University in Saint Louis. Graduated from University of Chicago, 2018 under the supervision of Wei Biao Wu.
Thursday, 04/18/2024, Time: 11:00am-12:15pm, Advances in Watermarking Large Language Models
Location: Public Affairs Building 1222
Yu-Xiang Wang, Associate Professor and Eugene Aas ChairDepartment of Computer Science,
and Department of Statistics, UC Santa Barbara
Abstract:
As the digital age progresses, artificial intelligence (AI), especially in the form of large language models (LLMs) like ChatGPT, has become increasingly influential in our daily lives. These technologies have the power to write essays, generate news articles, and even create realistic conversations. However, this capability also presents risks such as the generation of misinformation, academic dishonesty, and cybersecurity threats. A promising recent approach to mitigate these challenges is “watermarking”. By injecting subtle statistical signals to the LLM-generation process, one can reliably detect AI-generated texts or event attribute downstream models trained using these texts. In this talk, I will talk about existing attempts in formalizing this problem and discuss two recent work of ours: (1) Unigram (Green-Red) watermark https://arxiv.org/abs/2306.17439; (2) Permute-and-Flip watermark https://arxiv.org/abs/2402.05864.
Biography:
Yu-Xiang Wang is the Eugene Aas Associate Professor of Computer Science at UCSB. He directs the Statistical Machine Learning lab and co-founded the UCSB Center for Responsible Machine Learning. Yu-Xiang received his PhD in 2017 from Carnegie Mellon University (CMU). Yu-Xiang’s research interests include statistical theory and methodology, differential privacy, reinforcement learning, online learning and deep learning. His work had been supported by an NSF CAREER Award, Amazon ML Research Award, Google Research Scholar Award, Adobe Data Science Research Award and had received paper awards from KDD’15, WSDM’16, AISTATS’19 and COLT’21.
Thursday, 04/11/2024, Time: 11:00am-12:15pm, Consistency Models
Location: Public Affairs Building 1222
Yang Song, Researcher at OpenAI, Strategic Explorations team and (Incoming) Assistant Professor, Electrical Engineering and Computing + Mathematical Sciences, California Institute of Technology (Caltech)
Abstract:
We will explore consistency models, a new family of methods designed to mitigate the slow generation limitations of diffusion models. These models rapidly generate high-quality samples from noise in a single step but also allow for multi-step sampling to trade compute for quality. Like diffusion models, they enable zero-shot data editing tasks such as image inpainting, colorization, and super-resolution, without the need for task-specific training. Consistency models can be trained through knowledge distillation from pre-existing diffusion models or built from scratch as standalone generative models. Experiments showcase their superiority over existing diffusion distillation techniques for one- and few-step sampling. When implemented independently, they excel on various image datasets, emerging as strong contenders against traditional one-step generative models like normalizing flows and VAEs.
Biography:
Yang Song leads the Strategic Explorations team at OpenAI and is also an incoming Assistant Professor at Caltech. His research interests include data efficiency, training efficiency, and inference efficiency of multimodal generative models.
Thursday, 04/04/2024, Time: 11:00am-12:15pm, Theoretical Foundations of Feature Learning
Location: Public Affairs Building 1222
Mahdi Soltanolkotabi, Associate Professor
Departments of Electrical and Computer Engineering and Computer Science, University of Southern California
Abstract:
One of the major transformations in modern learning is that contemporary models trained through gradient descent have the ability to learn versatile representations that can then be applied effectively across a broad range of down-stream tasks. Existing theory however suggests that neural networks, when trained via gradient descent, behave similar to kernel methods that fail to learn representations that can be transferred. In the first part of this talk I will try to bridge this discrepancy by showing that gradient descent on neural networks can indeed learn a broad spectrum of functions that kernel methods struggle with, by acquiring task-relevant representations. In the second part of the talk I will focus on feature learning in prompt-tuning which is an emerging strategy to adapt large language models (LLM) to downstream tasks by learning a (soft-)prompt parameter from data. We demystify how prompt-tuning enables the model to focus attention to context-relevant information/features.
Biography:
Mahdi Soltanolkotabi is the director of the center on AI Foundations for the Sciences (AIF4S) at the University of Southern California. He is also an associate professor in the Departments of Electrical and Computer Engineering, Computer Science, and Industrial and Systems engineering where he holds an Andrew and Erna Viterbi Early Career Chair. Prior to joining USC, he completed his PhD in electrical engineering at Stanford in 2014. He was a postdoctoral researcher in the EECS department at UC Berkeley during the 2014-2015 academic year. Mahdi is the recipient of the Information Theory Society Best Paper Award, Packard Fellowship in Science and Engineering, an NIH Director’s new innovator award, a Sloan Research Fellowship, an NSF Career award, an Airforce Office of Research Young Investigator award (AFOSR-YIP), the Viterbi school of engineering junior faculty research award, and faculty awards from Google and Amazon. His research focuses on developing the mathematical foundations of modern data science via characterizing the behavior and pitfalls of contemporary nonconvex learning and optimization algorithms with applications in deep learning, large scale distributed training, federated learning, computational imaging, and AI for scientific and medical applications.
Wednesday, 03/13/2024, Time: 3:30pm-4:45pm, Beyond Global Correlations
Location: Royce Hall 164 and online (please note special time and location)
Dr. Hongyu Zhao, Professor of Biostatistics and Professor of Statistics and Data Science and Genetics
Yale University
Abstract:
Correlations are one of the most commonly used statistics that quantify the dependence between two variables. Although global correlations calculated using all the data points collected in a study are informative, some distinct and important local and context-dependent patterns may be masked by global measures. In this presentation, we will discuss the need, benefit, and challenges in inferring local and context-dependent correlations in genetics and genomics studies. We will focus on examples on local genetic correlations to identify genomic regions with shared effects on different complex traits and cell-type-specific correlations to identify co-regulated genes in disease-relevant cell types.
Biography:
Dr. Hongyu Zhao is the Ira V. Hiscock Professor of Biostatistics, Professor of Genetics, and Professor of Statistics and Data Science at Yale University. He received his BS in Probability and Statistics from Peking University in 1990 and PhD in Statistics from UC Berkeley in 1995. His research interests are the developments and applications of statistical methods in molecular biology, genetics, drug development, and precision medicine with a focus on biobank and single cell analysis recently. He is an elected fellow of the American Association for the Advancement of Science, the American Statistical Association, the Institute of Mathematical Statistics, and the Connecticut Academy of Science and Engineering. He received the Mortimer Spiegelman Award for a top statistician in health statistics by the American Public Health Association and Pao-Lu Hsu Prize by the International Chinese Statistical Association.
Thursday, 03/07/2024, Time: 3:30pm-4:45pm, The Factuality and Generalization Crisis of LLMs
Dr. Saadia Gabriel, Data Science Faculty Fellow at New York University and incoming UCLA Computer Science Assistant Professor
Abstract:
Large language models (LLMs) like ChatGPT are increasingly determining the course of our everyday lives. They can decide what content we are likely to see on social media. Their involvement has also been proposed in cases of potential life-or-death decisions like mental health treatment. So what happens when LLMs break? What are the risks posed from deceitful LLM behavior or their failure to perform a task correctly? How do we mitigate the negative effects of these failures on society and prevent discriminatory decision-making?
In this talk, I discuss the growing disconnect between scalability and safety in LLMs. I start with a positive use case of LLM persuasion – personalized interventions for mitigation of misinformation. I show that general-purpose LLMs can be steered through prompting to generate effective content reliability explanations conditioned on user preferences. In randomized controlled trials, we find that LLM explanations lead to significant differences in user behavior towards true and false claims. Next, I explore risks of such an intervention framework should it be deployed. In particular, I highlight risks from the propensity of LLMs for factual inaccuracies known as hallucinations and social biases.
Lastly, I discuss two recent works on deployment of LLMs in healthcare, done in collaboration with NYU Langone hospitals. I show that despite seemingly strong general performance, LLM behavior exhibits potentially serious inequities in the healthcare domain across demographic subgroups. This could lead to faulty recommendations without proper documentation of the disparity. I conclude by proposing simple and effective strategies for encouraging more equitable decision-making.
Biography:
Dr. Saadia Gabriel is a Data Science Faculty Fellow at NYU and incoming UCLA Computer Science Assistant Professor. Her research revolves around natural language processing and machine learning, with a particular focus on building systems for understanding how social commonsense manifests in text (i.e. how do people typically behave in social scenarios), as well as mitigating spread of false or harmful text (e.g. Covid-19 misinformation). Her work has been covered by a wide range of media outlets like Forbes and TechCrunch. It has also received a 2019 ACL best short paper nomination, a 2019 IROS RoboCup best paper nomination, a best paper award at the 2020 WeCNLP summit and a 2023 MIT Generative AI Impact award. In 2024, she was named on Forbes’ 30 under 30 list. She previously was a MIT CSAIL Postdoctoral Fellow and received her PhD from the University of Washington.
Thursday, 02/29/2024, Time: 3:30pm-4:45pm, Approaches on Multiple Testing for Studying Brain Activations
Michele Guidani, Professor
UCLA Department of Biostatistics
Abstract:
In this talk, we will discuss two approaches for multiple testing, focusing specifically on the identification of brain activations. The first part of the presentation will explore the challenges in identifying specific intervals of brain activity in an ECOG experiment, as a function of experimental conditions and other predictors. We refer to this problem as a problem of “local variable selection”. We discuss the limitations of common semi-parametric methods, notably their propensity for high rates of false positives due to slight model misspecifications. To address these issues, we propose a methodology based on orthogonal cut splines, designed for consistent local variable selection in high-dimensional settings. This approach is notable for its simplicity and versatility, capable of handling both continuous and discrete covariates, and providing robustness against model misspecification in various data settings, including independent and dependent data. The second part of the talk shifts focus to a novel Bayesian approach for identifying differentially activated brain regions using light sheet fluorescence microscopy — a recently developed technique for whole-brain imaging. Most existing statistical methods solve this problem by partitioning the brain regions into two classes: significantly and nonsignificantly activated regions. However, for the brain imaging problem at the center of our study, such binary grouping may provide overly simplistic discoveries by filtering out weak but important signals that are typically adulterated by the noise present in the data. To overcome this limitation, we introduce a new Bayesian approach that allows classifying the brain regions into several tiers with varying degrees of relevance. We show that this approach leads to more biologically meaningful and interpretable results in our brain imaging problem, since it allows the discrimination between active and inactive regions, while at the same time ranking the discoveries into clusters representing tiers of similar importance.
Biography:
Dr. Michele Guindani is a Professor in the Department of Biostatistics at the University of California, Los Angeles. Before joining UCLA in 2022, he has held positions as a faculty at UCI, at the MD Anderson Cancer Center and at the University of New Mexico. He is a Fellow of the American Statistical Association (ASA) and of the International Society for Bayesian Analysis (ISBA). He is currently a member of the Executive Council and President-Elect of the ISBA (2024-2026). He is also chair of the Section on Bayesian Statistical Sciences of the ASA, chair-elect of the Section on Statistical Imaging of the ASA, and he is also serving as the Membership Engagement Chair of the Section U (Statistics) of the American Association for the Advancement of Science (AAAS). He has served as Editor-in-Chief of “Bayesian Analysis”, the official journal of the ISBA (2018-2021), and he is now serving as Associate Editor for the “Journal of the American Statistical Association, Theory and Methods”, “Biometrics” and “Econometrics and Statistics”.
Thursday, 02/22/2024, Time: 3:30pm-4:45pm, Towards Fast Mixing MCMC Methods for Structure Learning
Quan Zhou, Assistant Professor
Department of Statistics, Texas A&M University
Abstract:
This talk focuses on Markov chain Monte Carlo (MCMC) methods for structure learning of high-dimensional directed acyclic graph (DAG) models, a problem known to be very challenging because of the enormous search space and the existence of Markov equivalent DAGs. In the first part of the talk, we consider a random walk Metropolis-Hastings algorithm on the space of Markov equivalence classes and show that it has rapid mixing guarantee under some high-dimensional assumptions; in other words, the complexity of Bayesian learning of sparse equivalence classes grows only polynomially in n and p. In the second part of the talk, we propose an empirical Bayes formulation of the structure learning problem, where the prior assumes that all node variables have the same error variance, an assumption known to ensure the identifiability of the underlying DAG. Strong selection consistency for our model is proved under an assumption weaker than equal error variance. To evaluate the posterior distribution, we devise an order-based MCMC sampler and investigate its mixing behavior theoretically. Numerical studies reveal that, interestingly, imposing the equal variance assumption tends to facilitate the mixing of MCMC samplers and improve the posterior inference even when the model is mis-specified.
Biography:
Dr. Quan Zhou is an Assistant Professor of Statistics at Texas A&M University. He received his Ph.D. in statistical genetics from Baylor College of Medicine in 2017 and then spent two years as a postdoctoral research fellow at the Department of Statistics of Rice University. He has worked on the statistical methodology for variable selection, graphical models and randomized controlled trials, and his current research centers on Markov chain Monte Carlo sampling methods and stochastic control problems in data science.
Thursday, 02/15/2024, Time: 3:30pm-4:45pm, Robust Fitted-Q-Evaluation and Iteration under Sequentially Exogenous Unobserved Confounders
Dr. Angela Zhou, Assistant Professor
Data Sciences and Operations, USC Marshall School of Business
Abstract:
Offline reinforcement learning is important in domains such as medicine, economics, and e-commerce where online experimentation is costly, dangerous or unethical, and where the true model is unknown. However, most methods assume all covariates used in the behavior policy’s action decisions are observed. Though this assumption, sequential ignorability/unconfoundedness, likely does not hold in observational data, most of the data that accounts for selection into treatment may be observed, motivating sensitivity analysis. We study robust policy evaluation and policy optimization in the presence of sequentially-exogenous unobserved confounders under a sensitivity model. We propose and analyze orthogonalized robust fitted-Q-iteration that uses closed-form solutions of the robust Bellman operator to derive a loss minimization problem for the robust Q function, and adds a bias-correction to quantile estimation. Our algorithm enjoys the computational ease of fitted-Q-iteration and statistical improvements (reduced dependence on quantile estimation error) from orthogonalization. We provide sample complexity bounds, insights, and show effectiveness both in simulations and on real-world longitudinal healthcare data of treating sepsis. In particular, our model of sequential unobserved confounders yields an online Markov decision process, rather than partially observed Markov decision process: we illustrate how this can enable warm-starting optimistic reinforcement learning algorithms with valid robust bounds from observational data.
Biography:
Dr. Angela Zhou is an Assistant Professor at University of Southern California, Marshall School of Business in Data Sciences and Operations. She previously obtained her PhD from Cornell University (based at Cornell Tech). Her research interests are in statistical machine learning for data-driven sequential decision making under uncertainty, causal inference, and the interplay of statistics and optimization. She is particularly interested in applications-motivated methodology with guarantees in order to bridge method and practice. She was a co-program chair for ACM conference on Equity and Access in Algorithms, Mechanisms, and Optimization (EAAMO).
Thursday, 02/08/2024, Time: 3:30pm-4:45pm, Data-efficient Training and Pre-training of Deep Networks
Dr. Baharan Mirzasoleiman, Assistant Professor
Department of Computer Science, UCLA
Abstract:
Large datasets have enabled over-parameterized neural networks to achieve unprecedented success. However, training such models, with millions or billions of parameters, on large data requires expensive computational resources, which consume substantial energy, leave a massive amount of carbon footprint, and often soon become obsolete and turn into e-waste. To improve the efficiency, reliability, and sustainability of learning deep models, I discuss the first scalable framework with rigorous theoretical guarantees to identify the most valuable examples for supervised training and self-supervised pre-training deep networks. I also demonstrate the effectiveness of such a framework for training and pre-training various over-parameterized models on different vision and NLP benchmark datasets.
Biography:
Dr. Baharan Mirzasoleiman is an Assistant Professor in the Computer Science Department at UCLA. Before joining UCLA, she was a postdoctoral research fellow in Computer Science working with Jure Leskovec. She received her Ph.D. in CS from ETH Zurich advised by Andreas Krause. She is the recipient of an ETH medal for Outstanding Doctoral Thesis and of the NSF Career Award.
Her research aims to address sustainability, reliability and efficiency of machine learning by developing theoretically rigorous methods to select the most beneficial data for efficient and robust learning. She is also interested in improving models and learning algorithms. The resulting methodology is broadly applicable across a wide range of applications, including medical diagnosis and environmental sensing.
Thursday, 02/01/2024, Time: 3:30pm-4:45pm, Likelihood Free Learning of Spatiotemporal Hawkes Processes
Mouli Banerjee, Professor
Department of Statistics, University of Michigan
Location: Mathematical Sciences 5200
Abstract:
Hawkes Processes are quite popular for analyzing spatiotemporal data with triggering effects and have been used as a tool for algorithmic threat detection. However, in real applications, complete data on sample paths are usually unavailable (e.g. unreported crime), whilst (estimates of) missing rates may be known. As the intensity function of a Hawkes process depends on past events, this makes the use of likelihood based methods like EM essentially infeasible. On the other hand, MDE (minimum distance estimates) based on Wasserstein distances are readily computable using GAN training, as samples from a Hawkes process with a fixed set of parameters can be readily generated. We illustrate the use of such MDE estimates to learn the parameters of Hawkes processes and present applications to predictive policing. We also investigate the theoretical properties of the estimators by invoking recent work on entropy regularized optimal transport theory.
This is joint work with Pramit Das, Yuekai Sun and Yue Yu.
Biography:
Moulinath Banerjee completed his bachelor’s and master’s in statistics at the Indian Statistical Institute in 1995 and 1997, respectively,[1] then authored a doctoral dissertation, Likelihood Ratio Inference in Regular and Nonregular Problems in 2000, advised by Jon A. Wellner of the University of Washington. He remained in Washington as a lecturer until joining the University of Michigan faculty in 2001. His research interests comprise non-standard statistical models, shape-constrained methods, empirical process theory, distributed computing, learning across environments, and more recently, applications of OT at the statistics and machine learning interface. Apart from his statistical pursuits, he takes an avid interest in classical music, fine dining, literature, and philosophy, and together with a co-author has published a new translation of the Rubaiyat of Omar Khayyam from the original Persian. He is an elected fellow of both the ASA and the IMS, an IMS medallion lecture awardee for 2024, and the current Editor of IMS’s review journal, Statistical Science.
Thursday, 01/18/2024, Time: 3:30pm-4:45pm, Privacy in Advertising: Analytics and Modeling
Dr. Badih Ghazi, Research Scientist
Google Research
Location: Mathematical Sciences 5200
Abstract:
Privacy in general, and differential privacy (DP) in particular, have become important topics in data mining and machine learning. Digital advertising is a critical component of the internet and is powered by large-scale data analytics and machine learning models; privacy concerns around these are on the rise. This talk will provide an introduction to the problems that arise in private analytics and modeling in advertising, survey recent results, and describe the main research challenges in the space.
Biography:
Dr. Ghazi received his Ph.D. in Electrical Engineering and Computer Science from MIT in 2018, working with Drs. Madhu Sudan and Ronitt Rubinfeld. From September 2015 to February 2018, he was a visiting student at the Theory of Computation Group at Harvard. Previously, he got a M.S. in EECS also from MIT, and an B.Eng. in Computer and Communications Engineering from the American University of Beirut, working with Dr. Louay Bazzi. During academic year 2017-2018, he was supported by an IBM Ph.D. Fellowship. During academic year 2012-2013, he was supported by an MIT Irwin and Joan Jacobs Presidential Fellowship.
After his graduation he joined Google Research.
He has also served as area chair for NeurIPS 2021, 2022, 2024, ICLA 2024 and ICLR 2024.
Dr. Ghazi’s current research interests include algorithmic aspects of machine learning, differential privacy, error-correcting codes and communication under uncertainty.
Thursday, 11/30/2023, Time: 11:00am – 12:00pm, Contained Chaos: Quality Assurance for the Community Earth System Model
This seminar is part of the Joint Statistics and Biostatistics Seminar Series.
Dorit Hammerling, Associate Professor
Department of Applied Mathematics and Statistics, Colorado School of Mines
Location: online
Abstract:
State-of-the-science climate models are valuable tools for understanding past and present climates and are particularly vital for addressing otherwise intractable questions about future climate scenarios. The National Center for Atmospheric research leads the development of the popular Community Earth System Model (CESM), which models the Earth system by simulating the major Earth system components (e.g., atmosphere, ocean, land, river, ice, etc.) and the interactions between them. These complex processes result in a model that is inherently chaotic, meaning that small perturbations can cause large effects. For this reason, ensemble methods are common in climate studies, as a collection of simulations are needed to understand and characterize this uncertainty in the climate model system. While climate scientists typically use initial condition perturbations to create ensemble spread, similar effects can result from seemingly minor changes to the hardware or software stack. This sensitivity makes quality assurance challenging, and defining “correctness” separately from bit-reproducibility is really a practical necessity. Our approach casts correctness in terms of statistical distinguishability such that the problem becomes one of making decisions under uncertainty in a high-dimensional variable space. We developed a statistical testing framework that can be thought of as hypothesis testing combined with Principal Component Analysis (PCA). One key advantage of this approach for settings with hundreds of output variables is that it not only captures changes in individual variables but the relationship between variables as well. This testing framework referred to as “Ensemble Consistency Testing” has been successfully implemented and used for the last few years, and we will provide an overview of this multi-year effort and highlight ongoing developments. Two good background papers for this talk are:
- Quality Assurance and Error Identification for the Community Earth System Model. Proceedings of the First Int’l Workshop on Software Correctness for HPC Applications, 2017.
- A statistical investigation of the CESM ensemble consistency testing framework. NCAR technical note, 2018
Bio:
Dr. Hammerling obtained a M.A. and PhD (2012) from the University of Michigan in Statistics and Engineering, followed by a post-doctoral fellowship at the Statistical Applied Mathematical Sciences Institute in the program for Statistical Inference for massive data. She then joined the National Center for Atmospheric Research, where she led the statistics group within the Institute for Mathematics Applied to the Geosciences and worked in the Machine Learning division before becoming an Associate Professor in Applied Mathematics and Statistics at the Colorado School of Mines in January 2019.
Wednesday, 11/29/2023, Time: 3:30pm-4:45pm, Individualized Dynamic Model for Multi-resolutional Data
This seminar is part of the Joint Statistics and Biostatistics Seminar Series.
Annie Qu, Chancellor’s Professor
Department of Statistics, University of California – Irvine
Location: 43-105 CHS Building
Abstract:
Mobile health has emerged as a major success in tracking individual health status, due to. the popularity and power of smartphones and wearable devices. This has also brought great challenges in handling heterogeneous, multi-resolution data which arise ubiquitously in mobile health due to irregular multivariate measurements collected from individuals. In this talk, we propose an individualized dynamic latent factor model for irregular multi-resolution time series data to interpolate unsampled measurements of time series with low resolution. One major advantage of the proposed method is the capability to integrate multiple irregular time series and multiple subjects by mapping the multi-resolution data to the latent space. In addition, the proposed individualized dynamic latent factor model is applicable to capturing heterogeneous longitudinal information through individualized dynamic latent factors. In theory, we provide the interpolation error bound of the proposed estimator and derive the convergence rate with non-parametric approximation methods. Both the simulation studies and the application to smartwatch data demonstrate the superior performance of the proposed method compared to existing methods.
Bio:
Professor Qu’s research focuses on solving fundamental issues regarding structured and unstructured large-scale data, and developing cutting-edge statistical methods and theory in machine learning and algorithms on personalized medicine, text mining, recommender systems, medical imaging data and network data analyses for complex heterogeneous data.
Qu received her Ph.D. in Statistics from the Pennsylvania State University. Before she joined the UC Irvine, Professor Qu was Data Science Founder Professor of Statistics, and the Director of the Illinois Statistics Office at the University of Illinois at Urbana-Champaign. She was awarded a Brad and Karen Smith Professorial Scholar by the College of LAS at UIUC, a recipient of the NSF Career award in 2004-2009. She is a Fellow of the Institute of Mathematical Statistics, a Fellow of the American Statistical Association, and a Fellow of American Association for the Advancement of Science. She is also a recipient of 2024 IMS Medallion Award and Lecturer. She is JASA Theory and Methods co-editor in 2023-2025.
Thursday, 11/02/2023, Time: 11:00am – 12:00pm, Introduction to the Research Areas of the Faculty (Part 2)
The Faculty of the Department of Statistics and Data Science
Location: Haines Hall A25
The purpose of this seminar is to give an overview of the current research areas of the faculty in the Department of Statistics and Data Science. Each of the faculty will give a flash talk appetizing their research and how you can get involved in it. They will focus on those areas which have open problems suitable for Ph.D. dissertations, M.S. or M.A.S. theses. The faculty presenting will be those who did not do so last week.
This is a great way to get a sense of what the faculty do as individuals and what the department does as a whole. There will be time for questions and they are encouraged.
We already have videos and information on the faculty research here. It can be previewed before the meeting.
Thursday, 10/26/2023, Time: 11:00am – 12:00pm, Introduction to the Research Areas of the Faculty (Part 1)
The Faculty of the Department of Statistics and Data Science
Location: Haines Hall A25
The purpose of this seminar is to give an overview of the current research areas of the faculty in the Department of Statistics and Data Science. Each of the faculty will give a flash talk appetizing their research and how you can get involved in it. They will focus on those areas which have open problems suitable for Ph.D. dissertations, M.S. or M.A.S. theses.
This is a great way to get a sense of what the faculty do as individuals and what the department does as a whole. There will be time for questions and they are encouraged.
We already have videos and information on the faculty research here. It can be previewed before the meeting.
Thursday, 10/19/2023, Time: 11:00am – 12:15pm, An Integrative Approach to Understanding the Brain Computation: Challenges, Opportunities and Methodologies
Paul Bogdan, Associate Professor
Electrical and Computer Engineering, University of Southern California
Location: Haines Hall A25
Abstract:
Brains build compact models of the world from just a few noisy and conflicting observations. They predict events via memory-based analogies even when resources are limited. The ability of biological intelligence to generalize and complete a wide range of unknown heterogeneous tasks calls for a comprehensive understanding of how networks of interactions among neurons, glia, and vascular systems enable human cognition. This will serve as a basis for advancing the design of artificial general intelligence (AGI). In this talk, we introduce a series of novel mathematical tools which can help us reconstruct networks among neurons, infer their objectives, and identify their learning rules. To decode the network structure from very scarce and noisy data, we develop the first mathematical framework which identifies the emerging causal fractal memory phenomenon in the spike trains and the neural network topologies. We show that the fractional order operators governing the neuronal spiking dynamics provide insight into the topological properties of the underlying neuronal networks and improve the prediction of animal behavior during cognitive tasks. In addition to this, we propose a variational expectation-maximization approach to mine the optical imaging of brain activity and reconstruct the neuronal network generator, namely the weighted multifractal graph generator. Our proposed network generator inference framework can reproduce network properties, differentiate varying structures in the brain networks and chromosomal interactions, and detect topologically associating domain regions in conformation maps of the human genome. Moreover, we develop a multiwavelet-based neural operator in order to infer the objectives and learning rules of complex biological systems. We thus learn the operator kernel of an unknown partial differential equation (PDE) from noisy scarce data. For time-varying PDEs, this model exhibits 2-10X higher accuracy than state-of-the-art machine learning tools.
Bio:
Paul Bogdan is the Jack Munushian Early Career Chair and Associate Professor in the Ming Hsieh Department of Electrical and Computer Engineering at University of Southern California. He received his Ph.D. degree in Electrical & Computer Engineering from Carnegie Mellon University. His work has been recognized with a number of honors and distinctions, including the 2021 DoD Trusted Artificial Intelligence (TAI) Challenge award, the USC Stevens Center 2021 Technology Advancement Award for the AI framework for SARS-CoV-2 vaccine design, the 2019 Defense Advanced Research Projects Agency (DARPA) Director’s Fellowship award, the 2018 IEEE CEDA Ernest S. Kuh Early Career Award, the 2017 DARPA Young Faculty Award, the 2017 Okawa Foundation Award, the 2015 National Science Foundation (NSF) CAREER award, the 2012 A.G. Jordan Award from Carnegie Mellon University for an outstanding Ph.D. thesis and service, and several best paper awards. His research interests include cyber-physical systems, new computational cognitive neuroscience tools for deciphering biological intelligence, the quantification of the degree of trustworthiness and self-optimization of AI systems, new machine learning techniques for complex multi-modal data, the control of complex time-varying networks, the modeling and analysis of biological systems and swarms, new control techniques for dynamical systems exhibiting multi-fractal characteristics, performance analysis and design methodologies for heterogeneous manycore systems.
Thursday, 10/12/2023, Time: 11:00am – 12:15pm, Representation-based Reinforcement Learning
Bo Dai, Assistant Professor / Staff Research Scientist
Georgia Tech and Google DeepMind
Location: Haines Hall A25
Abstract:
The majority reinforcement learning (RL) algorithms are largely categorized as model-free and model-based through whether a simulation model is used in the algorithm. However, both of these two categories have their own issues, especially incorporating with function approximation: the exploration with arbitrary function approximation in model-free RL algorithms is difficult, while optimal planning becomes intractable in model-based RL algorithms with neural simulators. In this talk, I will present our recent work on exploiting the power of representation in RL to bypass these difficulties. Specifically, we designed practical algorithms for extracting useful representations, with the goal of improving statistical and computational efficiency in exploration vs. exploitation tradeoff and empirical performance in RL. We provide rigorous theoretical analysis of our algorithm, and demonstrate the practical superior performance over the existing state-of-the-art empirical algorithms on several benchmarks.
Bio:
Bo Dai is an assistant professor in Georgia Tech and a staff research scientist in Google DeepMind. He obtained his Ph.D. from Georgia Tech. His research interest lies in developing principled and practical machine learning methods for Decision AI, including reinforcement learning. He is the recipient of the best paper award of AISTATS and NeurIPS workshop. He regularly serves as area chair or senior program committee member at major AI/ML conferences such as ICML, NeurIPS, AISTATS, and ICLR.
Thursday, 10/05/2023, Time: 11:00am – 12:15pm, Comparing Climate Time Series
Tim DelSole, Professor
Department of Atmospheric, Oceanic, and Earth Sciences, George Mason University
Location: Haines Hall A25
Abstract:
In climate science, two questions arise repeatedly: (1) Has climate variability changed over time? (2) Do climate models accurately represent reality? Answering these questions requires a procedure for deciding if two data sets might have originated from the same source. While numerous statistical methods exist for comparing two data sets, most of these methods do not adequately consider spatial and temporal correlations and possible non-stationary signals in a comprehensive test. In this talk, I discuss a method that fills this gap. The basic idea is to assume that each data set comes from a vector autoregressive model. This model can capture typical spatial and temporal correlations in climate data in a parsimonious manner. Furthermore, non-stationary signals can be captured by adding suitable forcing terms. Then, deciding if two data sets came from the same source reduces to deciding if two autoregressive models share the same parameters. A decision rule and associated significance test is derived from the likelihood ratio method. In this talk, I discuss this procedure and additional procedures for isolating the source of any discrepancies. This procedure is applied to assess the realism of climate model simulations of North Atlantic Sea Surface Temperatures. According to this test, every climate model differs stochastically from observations, and differs from every other climate model, except when they originate from the same modeling center. In fact, differences among climate models are distinctive enough to serve as a fingerprint that differentiates a given model from both observations and any other model.
Bio:
Tim DelSole is a statistical climate scientist who studies the extent to which future climate changes can be predicted on time scales from weeks to years. he is also a senior research scientist at the Center for Ocean-Land-Atmosphere Studies. He serves as co-Chief Editor of the Journal of Climate. After completing my doctorate in 1993 from Harvard University, he became a Global Change Distinguished Postdoctoral Fellow for two years and a National Research Council Associate for two years at the NASA Goddard Space Flight Center. In 1997, he joined the GMU Center for Ocean-Land-Atmosphere Studies.