# Monday, 06/05/2023, Time: 11:00am – 12:15pm PTTransformers As Statisticians: Provable In-Context Learning With In-Context Algorithm Selection

Song Mei, Assistant Professor

Departments of Statistics and Electrical Engineering and Computer Sciences, UC Berkeley

Broad 2100A

Abstract:

Neural sequence models based on the transformer architecture have demonstrated remarkable in-context learning (ICL) abilities, where they can perform new tasks when prompted with training and test examples, without any parameter update to the model. In this talk, we theoretically investigate the strong ICL abilities of transformers. We first provide a statistical theory for transformers to perform ICL by deriving end-to-end quantitative results for the expressive power, in-context prediction power, and sample complexity of pre-training. Concretely, we show that transformers can implement a broad class of standard machine learning algorithms in context, such as least squares, ridge regression, Lasso, convex risk minimization for generalized linear models (such as logistic regression), and gradient descent on two-layer neural networks, with near-optimal predictive power on various in-context data distributions. Building on these “base” ICL algorithms, intriguingly, we show that transformers can implement more complex ICL procedures involving in-context algorithm selection, akin to what a statistician can do in real life—A single transformer can adaptively select different base ICL algorithms—or even perform qualitatively different tasks—on different input sequences, without any explicit prompting of the right algorithm or task.

Bio:

Song Mei is an assistant professor of statistics at UC Berkeley. He received his Ph. D. from Stanford in June 2020. Song’s research lies at the intersection of statistics and machine learning. His recent research interests include high dimensional statistical inference, theory of deep learning, and theory of reinforcement learning.

# Thursday, 06/01/2023, Time: 11:00am – 12:15pm PTSpectral Learning for High Dimensional Tensors

Ming Yuan, Professor

Department of Statistics, Columbia University

Young Hall CS76

Abstract:

Matrix perturbation bounds developed by Weyl, Davis, Kahan and Wedin and others play a central role in many statistical and machine learning problems. I shall discuss some of the recent progresses in developing similar bounds for higher order tensors. I will highlight the intriguing differences from matrices, and explore their implications in spectral learning problems.

Bio:

Ming Yuan is a Professor of Statistics and an Associate Director of the Data Science Institute at Columbia University. He was previously a Senior Investigator in Virology at Morgridge Institute for Research and a Professor of Statistics at University of Wisconsin at Madison, and prior to that Coca-Cola Junior Professor of Industrial and Systems Engineering at Georgia Institute of Technology. His research and teaching interests lie broadly in statistics and its interface with other quantitative and computational fields such as optimization, machine learning, computational biology, and financial engineering. He has served as the program secretary of the Institute for Mathematical Statistics (IMS), and a member of the advisory board for the Quality, Statistics and Reliability section of the Institute for Operations Research and the Management Sciences (INFORMS). He was also a co-Editor of The Annals of Statistics and has served on numerous editorial boards. He was named a Senior Fellow of the Institute for Theoretical Studies at ETH Zurich (2019), a Medallion Lecturer of IMS (2018), and a recipient of the Leo Breiman Junior Researcher Award (2017; American Statistical Association), the Guy Medal in Bronze (2014; Royal Statistical Society), and CAREER Award (2009; US National Science Foundation).

# Thursday, 05/25/2023, Time: 11:00am – 12:15pm PTMinimax Theory for Causal Inference

Sivaraman Balakrishnan, Associate Professor

Departments of Statistics and Data Science, and Machine Learning, Carnegie Mellon University

Young Hall CS76

Abstract:

The causal analysis of (observational) data plays a central role in essentially every scientific field. Many recent developments in causal inference, and functional estimation problems more generally, have been motivated by the fact that classical one-step de-biasing methods, or their more recent sample-split double machine-learning avatars, can outperform plugin estimators under surprisingly weak conditions. However, from a theoretical perspective our understanding of how to construct these estimators for non-standard functionals, how to assess their optimality, and how to improve them, is still far from complete.

I will present two vignettes within this theme. The first part develops minimax theory for estimating the conditional average treatment effect (CATE). Many methods for estimating CATEs have been proposed, but there remain important theoretical gaps in understanding if and when such methods are optimal. We close some of these gaps, by providing sharp minimax-rates for estimating the CATE when the nuisance functions are Holder smooth — highlighting important differences between the estimation of the CATE, and its more well-studied global counterpart (the average treatment effect).

In the second part, I will focus more broadly on functional estimation problems, and develop some minimax lower bounds for “”structure-agnostic”” functional estimation, to understand the strengths and limitations of the double machine learning perspective on functional estimation.

This talk will be based on joint work with Edward Kennedy and Larry Wasserman.

Bio:

Sivaraman is an Associate Professor with a joint appointment in the Department of Statistics and Data Science, and the Machine Learning Department at Carnegie Mellon. Prior to this he was a postdoctoral researcher at UC Berkeley working with Martin Wainwright and Bin Yu, and before that was a PhD student in Computer Science at Carnegie Mellon. His research interests are broadly in statistical machine learning and algorithmic statistics. Some particular areas that he is currently most fascinated by include robust statistics, minimax testing, optimal transport and causal inference.

# Thursday, 05/18/2023, Time: 11:00am – 12:15pm PTFitting The Degree Distribution Of Real Networks: A Mixture Truncated Zeta Model And A Weighted Evolving Hypergraph Model

Frederick Kin Hing Phoa, Professor

Institute of Statistical Science, Academia Sinica

Young Hall CS76

Abstract:

The degree distribution has attracted considerable attention from network scientists in the last few decades to have knowledge of the topological structure of networks. It is widely acknowledged that many real networks have power-law degree distributions. However, the deviation from such a behavior often appears when the range of degrees is small. Even worse, the conventional employment of the continuous power-law distribution usually causes an inaccurate inference as the degree should be discrete-valued. In this talk, we propose two different approaches to remedy these obstacles: a finite mixture model of truncated zeta distributions and a weighted evolving hypergraph model. The first model targets on a broad range of degrees that disobeys a power-law behavior in the range of small degrees while maintaining the scale-free behavior. The second model theoretically derives the exact asymptotic degree distribution under the mild distributional conditions on the number and the size ofhyperedges to be connected. We apply these two models to the fitting of the degree distribution of scientific collaboration networks extracted from the Web of Science.

Bio:

Frederick Kin Hing Phoa is a research fellow of Institute of Statistical Science at Academia Sinica (ISSAS) in Taiwan. He has been with ISSAS since 2009, after his graduation from University of California at Los Angeles in USA. He was an assistant research fellow of ISSAS during 2009-2013 and an associate research fellow of ISSAS during 2013-2018. He was also appointed as adjunct professors in National Taiwan University, National Central University and National Chiao Tung University in Taiwan.

His research areas of interest include design and analysis of experiments; methods of variable/model selection; network data and internet data analysis; nature-inspired metaheuristics optimization; big data analytics and deep learning techniques; semiparametric methods in data with missing covariates; statistical control; and scientometrics. Dr. Phoa has published scientific articles in statistical research journals including The Annals of Statistics, Statistica Sinica, Technometrics and many others, in interdisciplinary and computing research journals including Physical Review B, Physica A, Network Science, Chemometrics, Swarm and EvolutionaryOptimization and many others. Dr. Phoa is the recipient of Career Development Award (Academia Sinica) (2014), Ta-You Wu Award (MOST YoungResearcher Award) (2015), Best Paper Award at World Congress of Engineering (2015), Outstanding Research Award (MOST) (2018), Outstanding Scholar Award (Foundation for the Advancement of Outstanding Scholarship) (2020), and Investigator Award (Academia Sinica) (2022). Dr. Phoa also received a two-year international cost-share exchange support between MOST (Taiwan) and Royal Society (UK) in 2016, and a one-year international travel support between MOST (Taiwan) and Royal Society (Edinburgh). He was also invited to join a multi-years national project on analyzing large-scale citation network (Web of Science) in the Institute of Statistical Mathematics (ISM) in Japan since 2017, a multi-years interdisciplinary joint project on applying artificial intelligence to upgrade layer industry in the Ministry of Science and Technology (MOST) in Taiwan since 2018, and a multi-years thematic project on the structure, property, analytical techniques and visualization of large-scale real-time network data since 2020. Dr. Phoa served as associate editors in Computational Statistics, Japanese Journal of Statistics, Journal of Statistical Planning and Inference, and Technometrics. He was also the conference organizers of four CEDAs (2014, 2016, 2018, 2023), three ISM-ISI-ISSAS Joint Conferences (2022, 2023, 2024), and Research Metrics Workshop 2023.

# Thursday, 05/11/2023, Time: 11:00am – 12:15pm PTA Picture Of The Prediction Space Of Deep Networks

Pratik Chaudhari, Assistant Professor

Electrical and Systems Engineering, University of Pennsylvania

Young Hall CS76

Abstract:

Deep networks have many more parameters than the number of training data and can therefore overfit—and yet, they predict remarkably accurately in practice. Training such networks is a high-dimensional, large-scale and non-convex optimization problem and should be prohibitively difficult—and yet, it is quite tractable. This talk aims to illuminate these puzzling contradictions. We will argue that deep networks generalize well because of a characteristic structure in the space of learnable tasks. The input correlation matrix for typical tasks has a “sloppy” eigenspectrum where, in addition to a few large eigenvalues, there is a large number of small eigenvalues that are distributed uniformly over a very large range. As a consequence, the Hessian and the Fisher Information Matrix of a trained network also have a sloppy eigenspectrum. Using these ideas, we will demonstrate an analytical non-vacuous PAC-Bayes generalization bound for general deep networks. We will next develop information-geometric techniques to analyze the trajectories of the predictions of deep networks during training. By examining the underlying high-dimensional probabilistic models, we will reveal that the training process explores an effectively low dimensional manifold. Networks with a wide range of architectures, sizes, trained using different optimization methods, regularization techniques, data augmentation techniques, and weight initializations lie on the same manifold in the prediction space. We will also show that predictions of networks being trained on different tasks (e.g., different subsets of ImageNet) using different representation learning methods (e.g., supervised, meta-, semi supervised and contrastive learning) also lie on a low-dimensional manifold.

References:

- Does the data induce capacity control in deep learning? Rubing Yang, Jialin Mao, and Pratik Chaudhari. [ICML ’22] https://arxiv.org/abs/2110.14163
- Deep Reference Priors: What is the best way to pretrain a model? Yansong Gao, Rahul Ramesh, and Pratik Chaudhari. [ICML ’22] https://arxiv.org/abs/2202.00187
- A picture of the space of typical learnable tasks. Rahul Ramesh, Jialin Mao, Itay Griniasty, Rubing Yang, Han Kheng Teoh, Mark Transtrum, James P. Sethna, and Pratik Chaudhari. [ICML ’23]. https://arxiv.org/abs/2210.17011
- The Training Process of Many Deep Networks Explores the Same Low-Dimensional Manifold. Jialin Mao, Itay Griniasty, Han Kheng Teoh, Rahul Ramesh, Rubing Yang, Mark K. Transtrum, James P. Sethna, Pratik Chaudhari. 2023. https://arxiv.org/abs/2305.01604

Bio:

Pratik Chaudhari is an Assistant Professor in Electrical and Systems Engineering and Computer and Information Science at the University of Pennsylvania. He is a core member of the GRASP Laboratory. From 2018-19, he was a Senior Applied Scientist at Amazon Web Services and a Postdoctoral Scholar in Computing and Mathematical Sciences at Caltech. Pratik received his PhD (2018) in Computer Science from UCLA, and his Master’s (2012) and Engineer’s (2014) degrees in Aeronautics and Astronautics from MIT. He was a part of NuTonomy Inc. (now Hyundai-Aptiv Motional) from 2014-16. He is the recipient of the Amazon Machine Learning Research Award (2020), NSF CAREER award (2022) and the Intel Rising Star Faculty Award (2022).

# Thursday, 05/04/2023, Time: 11:00am – 12:15pm PTStrategic Interactions through Data: From Cooperation to Competition

Eric Mazumdar, Assistant Professor

Computing and Mathematical Sciences and Economics, California Institute of Technology

Young Hall CS76

Abstract:

From matching drivers and riders on ride-sharing platforms to choosing where to deploy police in cities, machine learning algorithms are increasingly becoming the interface between stakeholders in socio-technical systems. The use of algorithms for critical decision-making, however, has highlighted key failures in our current approach to algorithm design in machine learning. Many of these failures stem from the interaction between algorithms and strategic agents. In this talk I will discuss some progress towards understanding these interactions with a particular emphasis on the fact that these interactions increasingly occur through data.

First, I will discuss some recent work on understanding the leverage that people have over the algorithms that use their data. While individuals have little to no power to systematically alter the output of a learning algorithm, we propose a simple theoretical model of a collective interacting with a firm’s learning algorithm. Our theory suggests that small collectives of people can exert significant control over learning algorithms. We validate our predictions on a classification task involving tens of thousands of resumes from a gig platform for freelancers. Through more than two thousand model training runs of a BERT-like language model, we see a striking correspondence emerge between our empirical observations and the predictions made by our theory.

I will conclude by describing the design of new algorithms for multi-agent reinforcement learning (MARL). Despite its successes, current approaches to MARL are often heuristic or assume centralized control of all agents in the game. I will discuss recent work on designing principled algorithms for zero-sum Markov games in which agents only receive reward information and do not observe each others’ policies or actions. In particular, we develop a rational, payoff-based independent learning algorithm for MARL and give the first-known sample complexity bound for payoff-based independent learning in zero-sum Markov games.

Bio:

Eric Mazumdar is an Assistant Professor in Computing and Mathematical Sciences and Economics at Caltech. His research lies at the intersection of machine learning and economics where he is broadly interested in developing the tools and understanding necessary to confidently deploy machine learning algorithms into societal-scale systems. Eric is the recipient of a NSF Career Award and was a fellow at the Simons Institute for Theoretical Computer Science. He obtained his Ph.D in Electrical Engineering and Computer Science at UC Berkeley where he was advised by Michael Jordan and Shankar Sastry and received his B.S. in Electrical Engineering and Computer Science at the Massachusetts Institute of Technology (MIT).

# Thursday, 04/27/2023, Time: 11:00am – 12:15pm PTGeneralized Data Thinning Using Sufficient Statistics

Jacob Bien, Associate Professor

Data Science and Operations, Business Administration at the Marshall School of Business, USC

Young Hall CS76

Abstract:

Sample splitting is one of the most tried-and-true tools in the data scientist toolbox. It breaks a data set into two independent parts, allowing one to perform valid inference after an exploratory analysis or after training a model. A recent paper (Neufeld, et al. 2023) provided a remarkable alternative to sample splitting, which the authors showed to be attractive in situations where sample splitting is not possible. Their method, called convolution-closed data thinning, proceeds very differently from sample splitting, and yet it also produces two statistically independent data sets from the original. In this talk, we will show that sufficiency is the key underlying principle that makes their approach possible. This insight leads naturally to a new framework, which we call generalized data thinning. This generalization unifies both sample splitting and convolution-closed data thinning as different applications of the same procedure. Furthermore, we show that this generalization greatly widens the scope of distributions where thinning is possible. This work is a collaboration with Ameer Dharamshi, Anna Neufeld, Keshav Motwani, Lucy Gao, and Daniela Witten.

Bio:

Jacob Bien is an associate professor of Data Science and Operations and the Dean’s Associate Professor in Business Administration at the Marshall School of Business, University of Southern California. His research focuses on statistical machine learning and in particular on the development of new methods that will be of direct use to scientists and others with large datasets. His work has been supported by various grants, including an NSF CAREER award and grants focused on developing statistical methods in biomedicine and oceanography. He serves as an associate editor of the Journal of the American Statistical Association and the Journal of the Royal Statistical Society (Series B) and was formerly an associate editor for Biometrika and two other journals. Before joining USC, he was an assistant professor at Cornell.

# Thursday, 04/20/2023, Time: 11:00am – 12:15pm PTGenerative Decision Making Under Uncertainty

Aditya Grover, Assistant Professor

Department of Computer Science, UCLA

Young Hall CS76

Abstract:

The ability to make sequential decisions under uncertainty is a key component of intelligence. Despite impressive breakthroughs in deep learning in the last decade, we find that scalable and generalizable decision making has so far been elusive for current artificial intelligence (AI) systems. In this talk, I will present a new framework for sequential decision making that is derived from modern generative models for language and perception. We will instantiate our framework in 3 different paradigms for sequential decision making: offline reinforcement learning (RL), online RL, and black-box optimization, and highlight the simplicity and effectiveness of this unifying framework on a range of challenging high-dimensional benchmarks for sequential decision making.

Bio:

Aditya Grover is an assistant professor of computer science at UCLA. His goal is to develop efficient machine learning approaches that can interact and reason with limited supervision with a focus on deep generative models and sequential decision making. He is also an affiliate faculty at the UCLA Institute of the Environment and Sustainability, where he grounds his research in real-world applications in climate science. Aditya’s 40+ research works have been published at top venues including Nature, deployed in production at major technology companies such as Twitter and Instagram, and covered in popular press venues such as Wall Street Journal and Wired. His research has been recognized with two best paper awards, four graduate research fellowships, four industry faculty awards, the ACM SIGKDD doctoral dissertation award, and the AI Researcher of the Year Award by Samsung. Aditya received his postdoctoral training at UC Berkeley, PhD from Stanford, and bachelors from IIT Delhi, all in computer science.

# Wednesday, 04/12/2023, Time: 3:30m – 4:30pm PTInference‐based Monte Carlo Integrations: A Likelihood or Bayesian Paradox?

# Thursday, 04/06/2023, Time: 11:00am – 12:15pm PTA Theoretically Tractable Cross Validation Framework For Signal Denoising

Sabyasachi Chatterjee, Assistant Professor

Department of Statistics, University of Illinois at Urbana Champaign

Young Hall CS76

Abstract:

We formulate a general cross validation framework for signal denoising. The general framework is then applied to nonparametric regression methods such as Trend Filtering and Dyadic CART. The resulting cross validated versions are then shown to attain nearly the same rates of convergence as are known for the optimally tuned analogues. There did not exist any previous theoretical analyses of cross validated versions of Trend Filtering or Dyadic CART. Our general framework is inspired by the ideas in Chatterjee and Jafarov (2015) and is potentially applicable to a wide range of estimation methods which use tuning parameters.

Bio:

Dr. Chatterjee is an Assistant Professor (from 2017 onwards) in the Statistics Department at University of Illinois at Urbana Champaign. Most of his research has been in Statistical Signal Processing. He is also interested in Probability and all theoretical aspects of Machine Learning. He obtained his PhD in 2014 at Yale University and then was a Kruskal Instructor at University of Chicago till 2017.

# Thursday, 03/16/2023, Time: 11:00am – 12:15pm PTStatistical models for mixed frequency data and their applications in forecasting economic indicators

George Michailidis, Professor

Department of Statistics, UCLA

Young Hall CS50

Abstract:

We discuss the problem of modeling and analysis of time series data that evolve at different frequencies (e.g., quarterly-monthly). We present univariate and multivariate modeling paradigms, outline and address technical challenges and illustrate their performance in forecasting tasks involving key macroeconomic indicators.

Bio:

George Michailidis is Professor of Statistics at UCLA. Prior to joining the Department of Statistics, he served as the founding Director for the U Florida Informatics Institute and before that he was Professor of Statistics and EECS at the University of Michigan. His research interests are on modeling and analysis of high dimensional data, on developing optimization algorithms for machine learning models and on stochastic control problems. Applications of research include analysis of Omics data, as well as studying computer, communication, transportation, social and financial networks.

# Thursday, 03/03/2023, Time: 11:00am – 12:15pm PTMachine Learning, Graphs, and Machine Learning on Graphs: from Kernel Methods to Predictive Medicine

Sanjukta Krishnagopal, UC Presidential Postdoctoral Fellow

UC Berkeley Mathematics and UCLA Mathematics

Young Hall CS50

Abstract:

Graphs are a natural way to model time-varying interactions between elements in various systems – social, political, biological, etc. First, I present some rigorous results on how graph neural networks, a form of machine learning (ML) for graphical data that are often treated as black boxes, learn. In particular, I use the neural tangent kernel to understand the precise learning dynamics of a graph neural network in the wide-network limit. But how does this kernel evolve as I grow the underlying graph in the graphon limit? I introduce the graphon-kernel, and prove convergence to the graphon-kernel (and convergence of the corresponding spectra) as the underlying graph grows. Through this, I show how one can perform ‘transferlearning’, i.e., train on a smaller network (e.g. a subgraph a of social media network) and ‘transfer’ rapidly to a larger network (e.g. the entire social media network) with theoretical performance and early-stopping guarantees.

Next, I introduce an entirely new framework for training neural networks in the first place with several advantages. Virtually *all* ML architectures use backpropagation (a form of gradient descent) for learning, however, backpropagation is notoriously opaque, data-intensive, and is rather slow. In recent work, we introduce dendritic gated networks, that use linear methods (that by themselves, cannot learn even the simplest non-linear functions) in conjunction with the new addition of hyperplane-gating, allowing it to match performance of conventional ML methods on a variety of benchmarked tasks. I show why this new paradigm has desirable properties such being significantly faster, not forgetting old tasks while learning new ones (catastrophic forgetting), being less prone to overfitting, and having smooth and interpretable weight functions, unlike conventional neural networks.

Lastly, I briefly discuss a research program on developing data-driven methods for studying time-evolving social and biological systems. In particular, I discuss a multilayer-network clustering algorithm for *subtyping *and* early prediction* of diseases that are characterized by time-evolving phenotypic-genotypic interactions, applied to diseases such as Parkinson’s and Stroke. In Parkinson’s, we predict disease trajectory with ~80% accuracy an entire 5 years in advance allowing for early prediction and personalized treatment.

Bio:

Sanjukta’s research lies at the interface of network science, machine learning, statistics and data science. She draws from her interdisciplinary training to develop methods, sometimes mathematically-rigorous and sometimes application-driven, to bridge the gap between quantitative computational methods and data-driven analyses for broad impact in interdisciplinary sciences. She received her PhD from the University of Maryland in Physics where she worked on graphs and complex systems and was a COMBINE fellow (Computation and Mathematics of Biological Networks) program. She then spent two years as a postdoc at the Gatsby Unit at University College London where she also worked with Google Deepmind. She is currently a UC presidential postdoctoral fellow with a joint appointment in UC Berkeley and UCLA Math. She is also a visiting researcher at the Simon’s Institute for the theory of computing. She has lived on 4 continents, and enjoys dancing, diving and hiking in her spare time.

# Thursday, 02/16/2023, Time: 11:00am – 12:15pm PTThe Power and Limitations of Convexity in Data Science

Oscar F. Leong, von Karman Instructor

Computing and Mathematical Sciences, California Institute of Technology

Young Hall CS50

Abstract:

Optimization is a fundamental pillar of data science. Traditionally, the art and challenge in optimization lay primarily in problem formulation to ensure desirable properties such as convexity. In the context of contemporary data science, however, optimization is practiced differently, with scalable local search methods applied to nonconvex objectives being the dominant paradigm in high-dimensional problems. This has brought a number of foundational mathematical challenges at the interface between optimization and data science pertaining to the dichotomy between convexity and nonconvexity. In this talk, I will discuss some of my work addressing these challenges in regularization, a technique to encourage structure in solutions to statistical estimation and inverse problems. Even setting aside computational considerations, we currently lack a systematic understanding from a modeling perspective of what types of geometries should be preferred in a regularizer for a given data source. In particular, given a data distribution, what is the optimal regularizer for such data and what are the properties that govern whether it is amenable to convex regularization? Using ideas from star geometry, Brunn-Minkowski theory, and variational analysis, I show that we can characterize the optimal regularizer for a given distribution and establish conditions under which this optimal regularizer is convex. Moreover, I describe results establishing the robustness of our approach, such as convergence of optimal regularizers with increasing sample size and statistical learning guarantees with applications to several classes of regularizers of interest.

Bio:

Dr. Oscar Leong is a von Karman Instructor in the Computing and Mathematical Sciences department at Caltech, hosted by Dr. Venkat Chandrasekaran. He also works with Dr. Katherine L. Bouman and the Computational Cameras group. He received his undergraduate degree in Mathematics from Swarthmore College and his PhD in 2021 in Computational and Applied Mathematics from Rice University under the supervision of Dr. Paul Hand, where he was an NSF Graduate Fellow. His research interests lie in the mathematics of data science, optimization, and machine learning, where he studies the theory and application of learning-based methods to solve inverse problems. He is broadly interested in using tools from convex geometry, high-dimensional statistics, and nonlinear optimization to better understand and improve data-driven, decision-making algorithms.

# Tuesday, 02/14/2023, Time: 11:00am – 12:15pm PTLearning Systems in Adaptive Environments. Theory, Algorithms and Design

Aldo Pacchiano, Postdoctoral Researcher

Microsoft Research

Mathematical Sciences 8359

Abstract:

Recent years have seen great successes in the development of learning algorithms in static predictive and generative tasks, where the objective is to learn a model that performs well on a single test deployment and in applications with abundant data. Comparatively less success has been achieved in designing algorithms for deployment in adaptive scenarios where the data distribution may be influenced by the choices of the algorithm itself, the algorithm needs to adaptively learn from human feedback, or the nature of the environment is rapidly changing. These are some of the most important challenges in the development of ML driven solutions for technologies such as internet social systems, ML driven scientific experimentation, and robotics. To fully realize the potential of these technologies we will necessitate better ways of designing algorithms for adaptive learning. In this talk I propose the following algorithm design considerations for adaptive environments 1) data efficient learning, 2) generalization to unseen domains via effective knowledge transfer and 3) adaptive learning from human feedback. I will give an overview of my work along each of these axes and introduce a variety of open problems and research directions inspired by this conceptual framing.

Bio:

Aldo Pacchiano is a Postdoctoral Researcher at Microsoft Research NYC. He obtained his PhD at UC Berkeley where he was advised by Peter Bartlett and Michael Jordan. His research lies in the areas of Reinforcement Learning, Online Learning, Bandits and Algorithmic Fairness. He is particularly interested in furthering our statistical understanding of learning phenomena in adaptive environments and use these theoretical insights and techniques to design efficient and safe algorithms for scientific, engineering, and large-scale societal applications.

# Thursday, 02/09/2023, Time: 11:00am – 12:15pm PTCombining biased and unbiased data for estimating stratified COVID-19 infection fatality rates

Gonzalo E. Mena, Postdoctoral Fellow

Department of Statistics, University of Oxford

Young Hall CS50

Abstract:

One major limitation of so-called ‘big data’ is that bigger sample sizes don’t lead to more reliable conclusions if data is corrupted by bias. However, responding to complex scientific and societal questions requires us to think about how to draw inferences out of such corrupted data efficiently. One emerging paradigm consists of suitably combining unbiased (but typically small and expensive) with biased (but cheap and bigger) datasets. Unfortunately, although Bayesian inference is a major workhorse of modern scientific research, methods for combining information within this paradigm are still lacking, and we often have to content ourselves with the suboptimal solution of throwing away all biased data.

In this talk, I will present a computationally efficient Bayesian method for combining biased and unbiased data enjoying theoretical guarantees. This method is based on a predictive philosophy: given a family of Bayesian models indexed by an unknown parameter representing how data should be merged, we seek to find the value that will best predict unobserved units given the rest of the observed ones. I study in-depth the performance of our method in the Gaussian case, showing that if D is greater than 8, then including biased data is always better than not doing so. Moreover, I show that it enjoys a certain robustness property, making it preferable to the best available baseline, the Green-Strawderman shrinkage estimator. This criterion can be seamlessly implemented through leave-one-out cross-validation in usual probabilistic programming pipelines, and I show through simulations that benefits manifest in more complex scenarios as well, for example, in hierarchical models.

I apply these methods to one important scientific and policy sensitive question: determining how COVID-19 lethality depends on age and socioeconomic status. This problem is remarkably hard since lethality is defined in terms of the true number of infections, a quantity that we typically observe with bias. Using small-area data from Chile, I present three stratified examples based on biased (administrative surveillance data), unbiased (a serosurvey) and biased + unbiased data to confirm the result that there is a strong dependence of lethality on socioeconomic status among younger populations.

Bio:

Gonzalo Mena is a Florence Nightingale Fellow in Computational Statistics and Machine Learning at the Department of Statistics, University of Oxford. Prior to that he was a Data Science Initiative Postdoctoral fellow at Harvard University. He earned his PhD in Statistics at Columbia University advised by Liam Paninski. Before his PhD, he obtained a bachelor’s degree in Mathematical Engineeging at Universidad of Chile, in his home country. His main research motivation is the development of statistical methods to address complex scientific and societal problems.

# Thursday, 02/02/2023, Time: 11:00am – 12:15pm PTTo Adjust or not to Adjust? Estimating the Average Treatment Effect in Randomized Experiments with Missing Covariates

Anqi Zhao, Assistant Professor

Department of Statistics and Data Science, National University of Singapore

Young Hall CS50

Abstract:

Randomized experiments allow for consistent estimation of the average treatment effect based on the difference in mean outcomes without strong modeling assumptions. Appropriate use of pretreatment covariates can further improve the estimation efficiency. Missingness in covariates is nevertheless common in practice and raises an important question: should we adjust for covariates subject to missingness, and if so, how? The unadjusted difference in means is always unbiased. The complete-covariate analysis adjusts for all completely observed covariates, and is asymptotically more efficient than the difference in means if at least one completely observed covariate is predictive of the outcome. Then what is the additional gain of adjusting for covariates subject to missingness? To reconcile the conflicting recommendations in the literature, we analyze and compare five strategies for handling missing covariates in randomized experiments under the design-based framework, and recommend the missingness-indicator method, as a known but not so popular strategy in the literature, due to its multiple advantages. First, it removes the dependence of the regression-adjusted estimators on the imputed values for the missing covariates. Second, it does not require modeling the missingness mechanism, and yields consistent estimators even when the missingness mechanism is related to the missing covariates and unobservable potential outcomes. Third, it ensures large-sample efficiency over the complete-covariate analysis and the analysis based on only the imputed covariates. Lastly, it is easy to implement via least squares. We also propose modifications to it based on asymptotic and finite sample considerations. Importantly, our theory views randomization as the basis for inference, and does not impose any modeling assumptions on the data generating process or missingness mechanism.

Bio:

Anqi Zhao is an assistant professor in the Department of Statistics and Data Science, National University of Singapore (NUS). She got her PhD in statistics from Harvard in 2016 and joined NUS in 2019 after an excursion to the management consulting world. Her research interests include experimental design and causal inference from randomized experiments and observational studies.

# Tuesday, 01/31/2023, Time: 11:00am – 12:15pm PTIterative Proximal Algorithms for Parsimonious Estimation

Alfonso Landeros, Postdoctoral Scholar

Computational Medicine, UCLA

Mathematical Sciences 8359

Abstract:

Statistical methods often involve solving an optimization problem, such as in maximum likelihood estimation and regression. The addition of constraints, either to enforce a hard requirement in estimation or to regularize solutions, complicates matters. Fortunately, the rich theory of convex optimization provides ample tools for devising novel methods. In this talk, I present applications of distance-to-set penalties to statistical learning problems. Specifically, I will focus on proximal distance algorithms, based on the MM principle, tailored to various applications such as regression and discriminant analysis. Emphasis is given to sparsity set constraints as a compromise between exhaustive combinatorial searches and lasso penalization methods that induce shrinkage.

Bio:

Alfonso Landeros is a postdoctoral scholar at the University of California, Los Angeles, advised by Kenneth Lange. A Bruin for most of his adult life, he obtained his Ph.D. at UCLA in March 2021 under the supervision of Kenneth Lange and Dr. Mary Sehl. Before that, he completed a B.S. in Mathematics/Applied Science at UCLA in June 2013. His research interests span applied probability, mathematical optimization, computational statistics, and their applications to questions in genomics, cancer, epidemiology, and immunology.

# Thursday, 01/26/2023, Time: 11:00am – 12:15pm PTMachine Learning Through the Lens of Differential Equations

Yuhua Zhu, Assistant Professor

The Halicioğlu Data Science Institute (HDSI) and the Department of Mathematics, UCSD

Young Hall CS50

Abstract:

In this talk, I will explore the rich interplay between differential equations and machine learning. I will highlight the use of collective dynamics and partial differential equations as powerful tools for improving machine learning algorithms and models. (i) In the first half of the talk, I will introduce a novel dynamical system that draws inspiration from collective intelligence observed in biology. This system offers a compelling alternative to gradient-based optimization. It enables gradient-free optimization to efficiently find global minimum in non-convex optimization problems. (ii) In the second half of the talk, I will build the connection between Hamilton-Jacobi-Bellman equations and the multi-armed bandit (MAB) problems. MAB is a widely used paradigm for studying the exploration-exploitation trade-off in sequential decision making under uncertainty. This is the first work that establishes this connection in a general setting. I will present an efficient algorithm for solving MAB problems based on this connection and demonstrate its practical applications.

Bio:

Yuhua Zhu is an assistant professor at UC San Diego, where she holds a joint appointment in the Halicioğlu Data Science Institute (HDSI) and the Department of Mathematics. Previously, she was a Postdoctoral Fellow at Stanford University, and received her Ph.D. from UW-Madison. Her work builds the bridge between differential equations and machine learning, spanning the areas of reinforcement learning, stochastic optimization, sequential decision-making, and uncertainty quantification. She won the John A. Nohel award, the Elizabeth S. Hirschfelder award at UW-Madison, and an NSF IIP award as the technology lead during her postdoc.

# Tuesday, 01/24/2023, Time: 11:00am – 12:15pm PTTowards a Foundation for Reinforcement Learning

Andrea Zanette, Postdoctoral Scholar

Department of Electrical Engineering and Computer Sciences, UC Berkeley

Location: MS 8359

Abstract:

In recent years, reinforcement learning algorithms have achieved a number of headline-grabbing empirical successes on a wide variety of complex tasks. However, applying the reinforcement learning paradigm to new problems is often challenging. It is unclear whether this is due to some fundamental statistical limits, or if we need better algorithms. In this talk I will present some recent results of my research towards establishing the theoretical foundations of reinforcement learning. I will focus on a type of reinforcement learning problem where the learner must identify a good decision-making strategy using an available dataset of pre-collected interactions with an environment. For this setting, I will discuss both some fundamental limits as well as some achievable statistical guarantees. I will conclude with a broad overview of my research in reinforcement learning in topics such as adaptivity and exploration, and mention some future directions.

Bio:

Andrea Zanette is a postdoctoral scholar in the Department of Electrical Engineering and Computer Sciences at the University of California, Berkeley, supported by a fellowship from the Foundation of Data Science Institute. He completed his PhD (2017-2021) in the Institute for Computational and Mathematical Engineering at Stanford University, advised by Prof Emma Brunskill and Mykel J. Kochenderfer. His PhD dissertation investigated modern Reinforcement Learning challenges such as exploration, function approximation, adaptivity, and learning from offline data. His work was supported by a Total Innovation Fellowship and his PhD thesis was awarded the Gene Golub Outstanding Dissertation Award from his department. Andrea’s background is in mechanical engineering. Before Stanford, he worked as a software developer in high-performance computing, as well as at the von Karman Institute for Fluid Dynamics, a NATO-affiliated international research establishment.

# Thursday, 01/19/2023, Time: 11:00am – 12:15pm PTAdvancing hidden Markov models for the study of animal movement

Vianey Leos Barajas, Assistant Professor

Department of Statistical Sciences and School of the Environment, University of Toronto

Location: Young Hall CS50

Abstract:

In the study of animal movement, we are often interested in understanding how and where animals exhibit important behaviors, such as foraging, traveling and resting. Often, we can attempt to correlate an animal’s movement and behavior with their use of the landscape to understand how animals modify their behaviors to forage in particular locations, to rest in others, and understand the importance of these locations and environment for their survival and reproduction. Much of the data collected are of the form of high-frequency time series, such that hidden Markov models (HMMs) have been often employed to capture the correlation structure exhibited and further identify latent states that can serve as proxies for animal behavior. In this talk, I will discuss recent advances in HMMs (and their extensions) and how this advances the study of animal movement.

Bio:

Vianey Leos Barajas received her PhD in Statistics from Iowa State University in 2019, and was a postdoctoral researcher at North Carolina State University from 2019-2020. Since July 2020, she is an Assistant Professor cross-appointed between the Department of Statistical Sciences and School of the Environment at the University of Toronto. She is primarily interested in working in the areas of ecological statistics, time series modeling, Bayesian statistics and, more recently, spatial modeling of environmental data. She is also a faculty affiliate at the Vector Institute for Artificial Intelligence and a faculty member of the Centre for Global Change Science at the University of Toronto.

# Tuesday, 01/17/2023, Time: 11:00am – 12:15pm PTStatistical modeling for single-cell multi-omics integration: a case study in CAR-T cell immunotherapy

Ye Zheng, Postdoctoral Research Fellow

Fred Hutchinson Cancer Center

Location: MS 8359

Abstract:

CD19-directed chimeric antigen receptor (CAR)-T cell immunotherapy has been established as an effective treatment to redirect T cells’ activity against tumors, yielding unprecedented response rates in patients with relapsed or refractory B cell malignancies. However, a considerable proportion of patients still do not achieve durable complete responses. Single-cell multi-omics analyses with cellular indexing of transcriptomes and epitopes by sequencing (CITE-seq) are leveraged to investigate the intrinsic genomic features governing the proliferation capacity of CAR-T cell infusion products. To remove the batch effect of the CITE-seq data, a statistical normalization model, ADTnorm, is proposed, which also enables single-cell proteomics integration across studies. Moreover, to fulfill the goal of associating genomic features with patients’ response to CAR-T cell therapy, a tree-based model, scanCT, is constructed to identify the associated genomic features in a highly interpretable manner. Going beyond genomic feature detection, we developed a statistical multi-omics integration model to gain further insight into the gene regulation mechanisms that potentially drive the efficacy and neurotoxicity of CAR-T cell immunotherapy. Such an integration model bridges single-cell three-dimensional chromatin structure measurements with transcriptomics and epigenomics profiles. Therefore, the integration model enables the construction of a multi-modal cellular network on which the gene cis-regulations and associations with clinical outcomes can be inferred.

Bio:

Ye Zheng is currently a Postdoctoral Research Fellow at the Fred Hutchinson Cancer Center in Seattle. She is supervised by Drs. Raphael Gottardo and Steven Henikoff from both statistical and computational modeling and molecular biology perspectives. Her work involves statistical modeling of single-cell transcriptomics, proteomics and epigenomics data with application to immunology and immunotherapy. Before her postdoctoral training, Ye received the Ph.D. in statistics from the University of Wisconsin-Madison, supervised by Dr. Sunduz Keles. Her thesis focuses on statistical modeling and computational tools development to investigate the three-dimensional chromatin structure and the gene regulation mechanism.

# Thursday, 01/12/2023, Time: 11:00am – 12:15pm PTStable Variable Selection with Knockoffs

Zhimei Ren, Postdoctoral Researcher

University of Chicago

Location: Young Hall CS 50

Abstract:

A common problem in many modern statistical applications is to find a set of important variables—from a pool of many candidates—that explain the response of interest. For this task, model-X knockoffs offers a general framework that can leverage any feature importance measure to produce a variable selection algorithm: it discovers true effects while rigorously controlling the number or fraction of false positives, paving the way for reproducible scientific discoveries. The model-X knockoffs, however, is a randomized procedure that relies on the one-time construction of synthetic (random) variables. Different runs of model-X knockoffs on the same dataset often result in different sets of selected variables, which is not desirable for the reproducibility of the reported results. In this talk, I will introduce derandomization schemes that aggregate the selection results across multiple runs of the knockoffs algorithm to yield stable selection. In the first part, I will present a derandomization scheme that controls the number of false positives, i.e., the per family error rate (PFER) and the k family-wise error rate (k-FWER). In the second part, I will talk about an alternative derandomization scheme with provable false discovery rate (FDR) control. Equipped with these derandomization steps, the knockoffs framework provides a powerful tool for making reproducible scientific discoveries. The proposed methods are evaluated on both simulated and real data, demonstrating comparable power and dramatically lower selection variability when compared with the original model-X knockoffs.

Bio:

Zhimei Ren is a postdoctoral researcher in the Statistics Department at the University of Chicago, advised by Professor Rina Foygel Barber. Before joining the University of Chicago, she obtained her Ph.D. in Statistics from Stanford University, under the supervision of Professor Emmanuel Candès. Her research interests lie broadly in multiple hypothesis testing, distribution-free inference, causal inference, survival analysis and data-driven decision-making.

# Tuesday, 01/10/2023, Time: 11:00am – 12:15pm PTFrom HeartSteps to HeartBeats: Personalized Decision-making

Raaz Dwivedi, Postdoctoral Fellow

Harvard University and MIT

Location: MS 8359

Abstract:

Ever-increasing access to data and computational power allows us to make decisions that are personalized to users by taking their behaviors and contexts into account. These developments are especially useful in domains like mobile health and medicine. For effective personalized decision-making, we need to revisit two fundamental tasks: (1) estimation and inference from data when there is no model for a decision’s effect on a user and (2) simulations when there is a known model for a decision’s effect on a user. Here we must overcome the difficulties facing classical approaches, namely statistical biases due to adaptively collected data and computational bottlenecks caused by high-dimensional models. This talk addresses both tasks. First, I provide a nearest-neighbor approach for unit-level statistical inference in sequential experiments. I also introduce a doubly robust variant of nearest neighbors that provides sharp error guarantees and helps measure a mobile app’s effectiveness in promoting healthier lifestyle with limited data. For the second task, I introduce kernel thinning, a practical strategy that provides near-optimal distribution compression in near-linear time. This method yields significant computational savings when simulating models of cardiac functioning.

Bio:

Raaz Dwivedi is a FODSI postdoc fellow advised by Prof. Susan Murphy and Prof. Devavrat Shah in CS and Statistics, Harvard and EECS, MIT respectively. He earned his Ph. D. at EECS, UC Berkeley, advised by Prof. Martin Wainwright and Prof. Bin Yu; and his bachelors degree at EE, IIT Bombay, advised by Prof. Vivek Borkar. His research builds statistically and computationally efficient strategies for personalized decision-making with theory and methods spanning the areas of causal inference, reinforcement learning, random sampling, and high-dimensional statistics. He won the President of India Gold Medal at IIT Bombay, the Berkeley Fellowship, teaching awards at UC Berkeley and Harvard, and a best student paper award for his work on optimal compression.

# Thursday, 12/01/2022, Time: 11:00am – 12:15pm PTGeometry of Dependency Equilibria

Bernd Sturmfels

Professor of Mathematics, Statistics and Computer Science, University of California, Berkeley

Director of the Max Planck Institute for Mathematics in the Sciences, Leipzig, Germany

Location: Geology Building, Room 4660

Abstract:

An n-person game is specified by n tensors of the same format. Its equilibria are points in that tensor space. Dependency equilibria satiisfy linear constraints on conditional probabilities. These cut out the Spohn variety, named after the philosopher who introduced the concept. Nash equilibria are tensors of rank one. We discuss the real algebraic geometry of the Spohn variety and its payoff map, with emphasis on connections to oriented matroids and algebraic statistics. This is joint work with Irem Portakal.

Bio:

Bernd Sturmfels received doctoral degrees in 1987 from the University of Washington and the Technical University Darmstadt, and an honorary doctorate in 2015 from the Goethe University Frankfurt. After postdoctoral years in Minneapolis and Linz, he taught at Cornell University, before joining UC Berkeley in 1995, where he served as Professor of Mathematics, Statistics and Computer Science. Since 2017 he is a director at the Max-Planck Institute for Mathematics in the Sciences, Leipzig. In 2018 he became Honorary Professor at Technical University Berlin and University of Leipzig. His awards include a David and Lucile Packard Fellowship, a Humboldt Senior Research Prize, the SIAM von Neumann Lecturer-ship, the Sarlo Distinguished Mentoring Award, and the George David Birkhoff Prize in Applied Mathematics. He is a fellow of the AMS and SIAM, and a member of the Berlin-Brandenburg Academy of Sciences. In 2022 he spoke at the International Congress of Mathematicians. Sturmfels mentored 60 doctoral students and numerous postdocs, and he authored 11 books and 300 research articles, in combinatorics, commutative algebra, algebraic geometry, and their applications to fields like statistics, optimization, and computational biology.

Here is a link to the slides used in the presentation.

# Thursday, 11/10/2022, Time: 11:00am – 12:15pm PTNeural Networks, Wide and Deep, Singular Kernels and Bayes Optimality

Mikail Belkin, Professor

Halicioğlu Data Science Institute, University of California, San Diego

Location: Geology Building, Room 4660

Abstract:

Wide and deep neural networks are used in many important practical setting.In this talk I will discuss some aspects of width and depth related to optimization and generalization. I will first discuss what happens when neural networks become infinitely wide, giving a general result for the transition to linearity (i.e., showing that neural networks become linear functions of parameters) for a broad class of wide neural networks corresponding to directed graphs. I will then proceed to the question of depth, showing equivalence between infinitely wide and deep fully connected networks trained with gradient descent and Nadaraya-Watson predictors based on certain singular kernels. Using this connection we show that for certain activation functions these wide and deep networks are (asymptotically) optimal for classification but, interestingly, never for regression. (Based on joint work with Chaoyue Liu, Adit Radhakrishnan, Caroline Uhler and Libin Zhu.)

Bio:

Mikhail Belkin received his Ph.D. in 2003 from the Department of Mathematics at the University of Chicago. His research interests are in theory and applications of machine learning and data analysis. Some of his well-known work includes widely used Laplacian Eigenmaps, Graph Regularization and Manifold Regularization algorithms, which brought ideas from classical differential geometry and spectral analysis to data science. His recent work has been concerned with understanding remarkable mathematical and statistical phenomena observed in deep learning. This empirical evidence necessitated revisiting some of the basic concepts in statistics and optimization. One of his key recent findings is the “double descent” risk curve that extends the textbook U-shaped bias-variance trade-off curve beyond the point of interpolation. Mikhail Belkin is a recipient of a NSF Career Award and a number of best paper and other awards. He has served on the editorial boards of the Journal of Machine Learning Research, IEEE Pattern Analysis and Machine Intelligence and SIAM Journal on Mathematics of Data Science.

# Thursday, 11/03/2022, Time: 11:00am – 12:15pm PTLeveraging Machine Learning for Climate Modeling

Laure Zanna, Professor

Mathematics & Atmosphere/Ocean Science, Courant Institute, New York University

Location : Geology Building, Room 4660

Abstract:

Climate simulations, which solve approximations of the governing laws of fluid motions on a grid, remain one of the best tools to understand and predict global and regional climate change. Uncertainties in climate predictions originate partly from the poor or lacking representation of processes, such as ocean turbulence and clouds, that are not resolved in global climate models but impact the large-scale temperature, rainfall, sea level, etc. The representation of these unresolved processes has been a bottleneck in improving climate simulations and projections. The explosion of climate data and the power of machine learning (ML) algorithms are suddenly offering new opportunities: can we deepen our understanding of these unresolved processes and simultaneously improve their representation in climate models to reduce climate projections uncertainty? In this talk, I will discuss the advantages and challenges of using machine learning for climate projections. I will focus on our recent work in which we leverage ML tools to learn representations of unresolved ocean processes. Some of our work suggests that machine learning could open the door to discovering new physics from data and enhance climate predictions. Yet, many questions remain unanswered, making the next decade exciting and challenging for ML + climate modeling.

Bio:

Laure Zanna is a Professor in Mathematics & Atmosphere/Ocean Science at the Courant Institute, New York University. Her research focuses on the dynamics of the climate system and the main emphasis of her work is to study the influence of the ocean on local and global scales. Prior to NYU, she was a faculty member at the University of Oxford until 2019, and obtained her PhD in 2009 in Climate Dynamics from Harvard University. She was the recipient of the 2020 Nicholas P. Fofonoff Award from the American Meteorological Society “For exceptional creativity in the development and application of new concepts in ocean and climate dynamics”. She is the lead principal investigator of the NSF-NOAA Climate Process Team on Ocean Transport and Eddy Energy, and M2LInES – an international effort to improve climate models with scientific machine learning. She currently serves as an editor for the Journal of Climate, a member on the International CLIVAR Ocean Model Development Panel, and on the CESM Advisory Board.

# Thursday, 10/27/2022, Time: 11:00am – 12:15pm PTIt’s Not What We Said, It’s Not What They Heard, It’s What They Say They Heard

Barry D. Nussbaum, Adjunct Professor at University of Maryland Baltimore County / Chief Statistician, Environmental Protection Agency (retired) / 2017 President, American Statistical Association

Location: Geology 4660

Abstract:

Statisticians have long known that success in our profession frequently depends on our ability to succinctly explain our results so decision makers may correctly integrate our efforts into their actions. However, this is no longer enough. While we still must make sure that we carefully present results and conclusions, the real difficulty is what the recipient thinks we just said. The situation becomes more challenging in the age of “big data”. This presentation will discuss what to do, and what not to do. Examples, including those used in court cases, executive documents, and material presented for the President of the United States, will illustrate the principles (This talk will NOT be recorded).

Bio:

Barry D. Nussbaum was the Chief Statistician for the U.S. Environmental Protection Agency from 2007 until his retirement in March 2016. He started his EPA career in 1975 in mobile sources and was the branch chief for the team that phased lead out of gasoline. Dr. Nussbaum is the founder of the EPA Statistics Users Group. In recognition of his notable accomplishments, he was awarded the Environmental Protection Agency’s Distinguished Career Service Award.

Dr. Nussbaum has a bachelor’s degree from Rensselaer Polytechnic Institute, and both a master’s and a doctorate from the George Washington University. In May 2015, he was elected the 112th president of the American Statistical Association. He has been a fellow of the ASA since 2007 and is an elected member of the International Statistical Institute. He has taught graduate statistics courses for George Washington University and Virginia Tech. In 2019, he was appointed Adjunct Professor of Mathematics and Statistics at the University of Maryland Baltimore County. In addition, he has even survived two terms as the treasurer of the Ravensworth Elementary School Parent Teacher Association

# Thursday, 10/20/2022, Time: 11:00am – 12:15pm PT Kernel Ordinary Differential Equations

Xiaowu Dai, Assistant Professor

Department of Statistics, University of California, Los Angeles

Location: Geology 4660

Abstract:

Ordinary differential equation (ODE) is widely used in modeling biological and physical processes in science. In this talk, I will discuss a new reproducing kernel-based approach for estimation and inference of ODE given noisy observations. We do not assume the functional forms in ODE to be known, or restrict them to be linear or additive, and we allow pairwise interactions. We construct confidence intervals for the estimated signal trajectories, and establish the estimation optimality and selection consistency of kernel ODE under both the low-dimensional and high-dimensional settings, where the number of unknown functionals can be smaller or larger than the sample size. Our proposal builds upon the smoothing spline analysis of variance (SS-ANOVA) framework, but tackles several important problems that are not yet fully addressed, and thus extends the scope of existing SS-ANOVA too. Our proposal is also more advantageous, in terms of statistical inference with noisy observations, than the existing physics-informed neural networks and sparsity-based methods.

Bio:

Xiaowu Dai is an Assistant Professor in the Department of Statistics at the University of California, Los Angeles. He received his M.S. in Mathematics and M.S. in Computer Sciences from the University of Wisconsin-Madison, and earned his PhD in Statistics in 2019, advised by Grace Wahba. He was a postdoc in the Department of EECS and the Department of Economics at the University of California, Berkeley, working with Michael I. Jordan and Lexin Li. His research interests bridge statistics, machine learning, mechanism design, and dynamical systems.

# Tuesday, 10/06/2022, Time: 11:00am – 12:15pm PT Data Perturbation

Xiaotong Shen, Professor

School of Statistics, University of Minnesota

Location : Geology Building, Room 4660

Abstract:

Data perturbation is a technique for generating synthetic data by adding “noise” to original data, which has a wide range of applications, primarily in data security. Yet, it has not received much attention within data science. In this presentation,I will describe a fundamental principle of data perturbation that preserves the distributional information, thus ascertaining the validity of the downstream analysis and a machine learning task while protecting data privacy. Applying this principle, we derive a scheme to allow a user to perturb data nonlinearly while meeting the requirements of differential privacy and statistical analysis. It yields credible statistical analysis and high predictive accuracy of a machine learning task. Finally, I will highlight multiple facets of data perturbation through examples. This work is joint with B Xuan and R Shen.

Bio:

Dr. Xiaotong T. Shen is the John Black Johnston Distinguished Professor in the College of Liberal Arts at the University of Minnesota. His areas of interest include machine learning and data science, high-dimensional inference, nonparametric and semi-parametric inference, causal graphical models, personalization, recommender systems, natural language processing and text mining, and nonconvex minimization. His current research effort is devoted to the further development of causal and constrained inference, structured learning, inference for black-box learners, and scalable analysis. The target application areas are biomedical sciences, artificial intelligence, and engineering.

Dr. Shen is a Fellow of the American Association for the Advancement of Science (AAAS), a Fellow of the American Statistical Association (ASA), a Fellow of the Institute of Mathematical Statistics (IMS), and an Elected Member of the International Statistical Institute (ISI). He won the Best Paper Award (with Pan and Xie) of the International Biometric Society in 2012. He is recognized in the list of “20 Data Science Professors to Know” by OnlineEngineeringPrograms.com .

# Tuesday, 10/04/2022, Time: 11:00am – 12:15pm PTThe role of Preferential Sampling in Spatial and Spatio-temporal Geostatistical Modeling

Alan E. Gelfand, Distinguished Emeritus Professor

Department of Statistical Science, Duke University

Location : Geology Building, Room 4660

**Abstract: **

The notion of preferential sampling was introduced into the literature in the seminal paper of Diggle et al.(2010). Subsequently, there has been considerable follow up research. A standard illustration arises in geostatistical modeling. Consider the objective of inferring about environmental exposures. If environmental monitors are only placed in locations where environmental levels tend to be high, then interpolation based upon observations from these locations will necessarily produce only high predictions. A remedy lies in suitable spatial design of the locations, e.g., a random or space-filling design for locations over the region of interest is expected to preclude such bias. However, in practice, sampling may be designed in order to learn about areas of high exposure.

While the set of sampling locations may not have been developed randomly, we study it as if it was a realization of a spatial point process. That is, it may be designed/specified in some fashion but not necessarily with the intention of being roughly uniformly distributed over D. Then, the question becomes a stochastic one: is the realization of the responses independent of the realization of the locations? If no, then we have what is called preferential sampling. Importantly, the dependence here is stochastic dependence. Notationally/functionally, the responses are associated with the locations.

Another setting is the case of species distribution modeling with a binary response, presence or absence, recorded at locations. Here, bias can arise when sampling is designed such that ecologists will tend to sample where they expect to find individuals. This setting can be extended to data fusion where we have both presence/absence data and presence-only data. Other potential applications include missing data settings and hedonic modeling for price with property sales. Very recent work explores preferential sampling in the context of multivariate geostatistical modeling.

Fundamental issues are: (i) can we identify the occurrence of a preferential sampling effect, (ii) can we adjust inference in the presence of preferential sampling, and (iii) when can such adjustment improve predictive performance over a customary geostatistical model? We consider these issues in a modeling context and illustrate with application to presence/absence data, to property sales, and to tree data where we observe mean diameter at breast height (MDBH) and trees per hectare (TPH). (This is joint work with Shinichiro Shirota and Lucia Paci.)

**Bio:**

Alan E. Gelfand is The James B Duke Professor Emeritus of Statistical Science at Duke University. He is former chair of the Department of Statistical Science (DSS) and enjoys a secondary appointment as Professor of Environmental Science and Policy in the Nicholas School. Author of more than 320 papers (more than 260 since 1990), Gelfand is internationally known for his contributions to applied statistics, Bayesian computation and Bayesian inference. (An article in Science Watch found him to be the tenth most cited mathematical scientist in the world over the period 1991-2001). Gelfand is an Elected Fellow of the American Statistical Association, the Institute of Mathematical Statistics, and the International Society for Bayesian Analysis. He is an Elected Member of the International Statistical Institute. He is a former President of the International Society for Bayesian Analysis and in 2006 he received the Parzen Prize for a lifetime of research contribution to Statistics. In 2012, he was chosen to give the distinguished Mahalanobis lectures. In 2013, he received a Distinguished Achievement Medal from the ASA Section on Statistics in the Environment. In 2019, he received the S.S. Wilks Memorial Award from the American Statistical Association.

Gelfand’s primary research focus for the past twenty five years has been in the area of statistical modeling for spatial and space-time data. Through a collection of more than 150 papers he has advanced methodology, using the Bayesian paradigm, to associate fully model-based inference with spatial and space-time data displays. His chief areas of application include environmental exposure, spatio-temporal ecological processes, and climate dynamics. He has four books in this area, including the successful “Hierarchical Modeling and Analysis for Spatial Data” with Sudipto Banerjee and Brad Carlin (now second edition), “Hierarchical Modeling for Environmental Data; Some Applications and Perspectives” with James Clark, the “Handbook of Spatial Statistics” with Peter Diggle, Montserrat Fuentes, and Peter Guttorp and the “Handbook of Environmental and Ecological Statistics” with Montserrat Fuentes, Jennifer Hoeting, and Richard Smith. In addition, he has a NSF-CBMS monograph with Erin Schliep entitled, “Bayesian Analysis and Computation for Spatial Point Patterns.”

# Thursday, 10/13/2022, Time: 11:00am – 12:15pm PT Regularized Stein Variational Gradient Descent and Langevin Diffusion

Krishna Balasubramanian, Assistant Professor

Department of Statistics, University of California, Davis

Location : Geology Building, Room 4660

Abstract:

Sampling is a fundamental task in statistics and applied mathematics. In the recent past, deterministic, particle-based sampling algorithms have been proposed as alternatives to randomized MCMC methods. A canonical example of a deterministic sampling algorithm is the Stein Variational Gradient Descent (SVGD). However, theoretical properties of the SVGD algorithm and comparisons to more standard MCMC methods are largely unknown.

In this talk, I will present a regularized RKHS formulation of the SVGD algorithm and present several results in a certain mean-filed limit setting. I will start with a motivation for the proposed regularization method and highlight how the regularized SVGD algorithm relates to the Langevin diffusion (whose discretization gives the Langevin Monte Carlo algorithm, a popular MCMC method). Next, I will discuss convergence results for the regularized SVGD algorithm in both the continuous-time and discrete-time settings, and compare with similar results for the MCMC methods.

Bio:

Krishna Balasubramanian is an assistant professor in the Department of Statistics, University of California, Davis. His recent research interests include stochastic optimization and sampling, geometric and topological statistics, and kernel methods. His research was/is supported by a Facebook PhD fellowship, and CeDAR and NSF grants.