2024 – 2025 Acad. Year

Friday 05/30/25, Time: 11:00am – 12:15pm, Semiparametric M-estimation with Overparameterized Neural Networks

Location: CHS 73-105

Fang Yao, Chair Professor
Department of Probability & Statistics, Peking University

Abstract:

We focus on semiparametric regression that has played a central role in statistics, and exploit the powerful learning ability of deep neural networks (DNNs) while enabling statistical inference on parameters of interest that offers interpretability. Despite the success of classical semiparametric method/theory, establishing the root-n consistency and asymptotic normality of the finite-dimensional parameter estimator in this context remains challenging, mainly due to nonlinearity and potential tangent space degeneration in DNNs. In this work, we introduce a foundational framework for semiparametric M-estimation, leveraging the approximation ability of overparameterized neural networks that circumvent tangent degeneration and align better with training practice nowadays. The optimization properties of general loss functions are analyzed, and the global convergence is guaranteed. Instead of studying the “ideal” solution to minimization of an objective function in most literature, we analyze the statistical properties of algorithmic estimators, and establish nonparametric convergence and parametric asymptotic normality for a broad class of loss functions. These results hold without assuming the boundedness of the network output and even when the true function lies outside the specified function space. To illustrate the applicability of the framework, we also provide examples from regression and classification, and the numerical experiments provide empirical support to the theoretical findings.

Bio:

Dr. Fang Yao is Chair Professor in Statistics at Peking University (PKU), serving as the Department Head of Probability & Statistics and the Director of the Center for Statistical Science at PKU. He is a Fellow of the Institute of Mathematical Statistics (IMS), the American Statistical Association (ASA). He received his B.S. degree in 2000 from University of Science & Technology in China, and his Ph.D. degree in Statistics in 2003 at University of California, Davis. He was a tenured Full Professor in Statistical Science at University of Toronto before joining PKU. Dr. Yao’s research focuses on complex-structured data analysis, including functional, high-dimensional, manifold and non-Euclidean data objects; incorporating machine/deep learning and partial/ordinary differential equations to establish scalable statistical modeling and inference; conducting applications involving functional, high-dimensional and differential dynamics in biomedical studies, human genetics, neuroimaging, finance and economics, engineering etc. He received the CRM-SSC Prize in Canada in 2014 and the 6th Xplorer Prize in 2024. He has served as the Editor for Canadian Journal of Statistics (2019-2021), and is/was on editorial boards for a numebr of statistical journals, including the Annals of Statistics and Journal of the American Statistical Association.

Thursday 05/29/25, Time: 11:00am – 12:15pm, Distributional Off-Policy Evaluation in Reinforcement Learning

Location: Franz Hall 2258A

Lan Wang, Centennial Endowed Chair Professor
Miami Herbert Business School, University of Miami

Abstract:

In the existing literature of reinforcement learning (RL), off-policy evaluation is mainly focused on estimating a value (e.g., an expected discounted cumulative reward) of a target policy given the pre-collected data generated by some behavior policy. Motivated by the recent success of distributional RL in many practical applications, we study the distributional off-policy evaluation problem in the batch setting when the reward is multi-variate. We propose an offline Wasserstein-based approach to simultaneously estimate the joint distribution of a multivariate discounted cumulative reward given any initial state-action pair in the setting of an infinite-horizon Markov decision process. Finite sample error bound for the proposed estimator with respect to a modified Wasserstein metric is established in terms of both the number of trajectories and the number of decision points on each trajectory in the batch data. Extensive numerical studies are conducted to demonstrate the superior performance of our proposed method (joint work with Zhengling Qi, Chenjia Bai, and Zhaoran Wang).

Bio:

Dr. Lan Wang is Centennial endowed chair professor and department chair of the Department of Management Science at the Miami Herbert Business School of the University of Miami, with a secondary appointment as Professor of Public Health Sciences at the Miller School of Medicine, University of Miami. Dr. Wang is an elected Fellow of the American Statistical Association, an elected Fellow of the Institute of Mathematical Statistics, and an elected member of the International Statistical Institute. She currently serves in the editorial board of Journal of the Royal Statistics Society. She has served as the Co-Editor for Annals of Statistics (2022-2024). Dr. Wang’s research covers several interrelated areas: high-dimensional statistical learning, quantile regression, reinforcement learning, optimal personalized decision recommendation, survival analysis, and business analytics. She is also interested in interdisciplinary collaboration, driven by applications in business, economics, health care, and other domains.

Thursday 05/22/25, Time: 11:00am – 12:15pm, Conditional Distributional Learning with Non-crossing Quantile Network and applications

Location: Franz Hall 2258

Hongtu Zhu, Professor
Department of Biostatistics, University of North Carolina at Chapel Hill

Abstract:

We introduce the Non-Crossing Quantile (NQ) Network, a novel approach for conditional distribution learning. By incorporating non-negative activation functions, the NQ network ensures monotonicity in learned distributions, effectively eliminating the issue of quantile crossing. The NQ network offers a highly adaptable deep distributional learning framework, applicable to a wide range of tasks, from non-parametric quantile regression to causal effect estimation and distributional reinforcement learning (RL). We further establish a comprehensive theoretical foundation for the deep NQ estimator and its application in distributional RL, providing rigorous analysis to support its effectiveness. Extensive experiments demonstrate the robustness and versatility of the NQ network across various domains, including clinical trials, e-commerce, games, and healthcare, highlighting its potential for real-world applications. This is based on a series of joint works with Drs. Shen, Luo, and Shi and Mr. Huang.

Bio:

Dr. Hongtu Zhu is a Professor of Biostatistics, Statistics, Radiology, Computer Science, and Genetics at the University of North Carolina at Chapel Hill. He previously served as a DiDi Fellow and Chief Scientist of Statistics at DiDi Chuxing (2018–2020) and held the Endowed Bao-Shan Jing Professorship in Diagnostic Imaging at MD Anderson Cancer Center (2016–2018). Dr. Zhu is a Fellow of IEEE, ASA, and IMS and currently serves as the Coordinating Editor of JASA and the Editor of JASA Applications and Case Studies. He has received several prestigious awards, including the Established Investigator Award from the Cancer Prevention Research Institute of Texas (2016) and the INFORMS Daniel H. Wagner Prize for Excellence in Operations Research Practice (2019). Dr. Zhu has authored over 350 publications in top journals such as Nature, Science, Cell, and Nature Genetics as well as all major statistical journals, and has presented more than 55 conference papers at leading machine learning and AI conferences, including NeurIPS, KDD, AAAI, ICML, and ICLR.

Friday 05/16/25, Time: 11:00am – 12:15pm, Nonparametric Expected Shortfall Regression

Location: Mathematical Sciences Building 8359

Wen-Xin Zhou, Associate Professor
Department of Information and Decision Sciences, University of Illinois Chicago

Abstract:

Expected Shortfall (ES), also known as superquantile or Conditional Value-at-Risk, has been recognized as an important risk measure in economics and finance. In this talk, we consider a joint regression framework that simultaneously models the conditional quantile and ES of a response variable given a set of covariates, for which the state-of-the-art approach is based on minimizing a joint loss function that is non-differentiable and non-convex. Motivated by the idea of using orthogonal scores to reduce sensitivity to nuisance parameters, we study a two-step framework for fitting joint quantile and ES regression models nonparametrically over RKHSs and using deep neural networks. We establish a non-asymptotic theory for the proposed estimators, carefully characterizing the impact of quantile estimation without relying on sample splitting. For ES kernel ridge regression, we further propose a fast inference method to construct pointwise confidence bands. For NN-based ES regression, we introduce a Huberized estimator that is robust against heavy tails in the response distribution. A Python package, quantes (https://pypi.org/project/quantes/), has been developed to implement various ES regression methods.

Bio:

Dr. Wenxin Zhou is an Associate Professor in the Department of Information and Decision Sciences at the University of Illinois Chicago. Before joining UIC, he was a faculty member at UC San Diego. His research focuses on high-dimensional statistical inference, nonparametric methods, robust statistics, and quantile regression. He currently serves as an Associate Editor for the Journal of the Royal Statistical Society Series B and the Annals of Applied Probability. He earned his PhD from the Hong Kong University of Science and Technology.

Thursday 05/15/25, Time: 3:00pm – 4:30pm, De Leeuw Seminar: Veridical Data Science and PCS Uncertainty Quantification

Location: Luskin Conference Center – Legacy Room

Bin Yu, CDSS Chancellor’s Distinguished Professor in Statistics, EECS, and Computational Biology, UC Berkeley

See the flyer here.

Abstract:

Data Science is central to AI and has driven most of the recent advances in biomedicine and beyond. Human judgment calls are ubiquitous at every step of the data science life cycle (DSLC): problem formulation, data cleaning,EDA, modeling, and reporting. Such judgment calls are often responsible for the “dangers” of AI by creating auniverse of hidden uncertainties well beyond sample-to-sample uncertainty. To mitigate these dangers, veridical (truthful) data science is introduced based on three key principles: Predictability, Computability and Stability (PCS).The PCS framework (including documentation) unifies, streamlines, and expands on the ideas and best practices ofstatistics and machine learning. This talk showcases PCS uncertainty quantification (PCS-UQ) with applications toprediction including deep learning. It compares PCS-UQ and makes connections with Conformal Prediction (CP). Over a collection of 17 regression tabular datasets, 6 multi-class tabular datasets, and 4 deep learning datasets, PCS-UQ reduces the size of the prediction intervals or sets by around 20% on average when compared to the best CPmethod among the ones used by PCS-UQ, and has better subgroup coverages than CP overall.

Bio:

Bin Yu is CDSS Chancellor’s Distinguished Professor in Statistics, EECS, and Computational Biology, and Scientific Advisor at theSimons Institute for the Theory of Computing, all at UC Berkeley. Herresearch focuses on the practice and theory of statistical machine learning,veridical data science, responsible and safe AI, and solving interdisciplinarydata problems in neuroscience, genomics, and precision medicine. She andher team have developed algorithms such as iterative random forests (iRF),stability-driven NMF, and adaptive wavelet distillation (AWD) from deeplearning models. She is a member of the National Academy of Sciences and of the American Academy of Arts and Sciences. She was a Guggenheim Fellow, IMS President, and delivered the IMS Rietz and Wald Lectures and Distinguished Achievement Award and Lecture (formerly Fisher Lecture) of COPSS. She holds an Honorary Doctorate from The University of Lausanne.

Thursday 05/08/25, Time: 11:00am – 12:15pm, Online Learning for Dynamic Assortment Selection with Positioning

Location: Franz Hall 2258A

Yufeng Liu, Professor
Departments of Statistics, Biostatistics and Genetics, University of North Carolina at Chapel Hill

Abstract:

Online decision-making is an interdisciplinary field combining statistics, operations research, computer science, and machine learning. Motivated by abundant e-business data, it offers a powerful framework for market design problems such as dynamic pricing and assortment selection. In this talk, we introduce Dynamic Assortment Selection with Positioning (DAP), a new online learning problem where customer choices depend on both item preferences and position effects modeled through a multinomial logit model. We show that ignoring position effects leads to linear regret and propose the Truncated Linear Regression Upper Confidence Bound (TLR-UCB) policy, which leverages a novel geometric linear-bandit feedback structure. We establish near-optimal regret bounds for TLR-UCB and further develop Explore-In-TLR-UCB (EI-TLR) to address unknown position effects. Extensive experiments demonstrate that both TLR-UCB and EI-TLR outperform existing benchmarks. This talk is based on joint work with Yiyun Luo and Will Wei Sun.

Bio:

Dr. Yufeng Liu is currently Professor of Statistics, Biostatistics, and Genetics at University of North Carolina at Chapel Hill (UNC). His research interests include statistical machine learning, complex data analysis, precision medicine, bioinformatics, and e-commerce. He has published more than 150 papers on top statistical research and application journals. Dr. Liu served as principal investigators for multiple research grants from National Science Foundation and National Institute of Health. He received the CAREER Award from the National Science Foundation in 2008, The Ruth and Phillip Hettleman Prize for Artistic and Scholarly Achievement from UNC in 2010, and the Inaugural Leo Breiman Junior Award from American Statistical Association in 2017. He is an elected Fellow of the American Statistical Association (ASA), the Institute of Mathematical Statistics (IMS), and an Elected Member of the International Statistical Institute (ISI).

Thursday 05/01/25, Time: 11:00am – 12:15pm, Expected Shortfall Regression

Location: Franz Hall 2258A

Xuming He, Chair and Kotzubei-Beckmann Distinguished Professor
Department of Statistics and Data Science, Washington University in St. Louis

Abstract:

Expected shortfall, measuring the average outcome (e.g., portfolio loss) above a given quantile of its probability distribution, is a common financial risk measure. The same measure can be used to characterize treatment effects in the tail of an outcome distribution, with applications ranging from policy evaluation in economics and public health to biomedical investigations. Expected shortfall regression is a natural approach of modeling covariate-adjusted expected shortfalls. Because the expected shortfall cannot be written as a solution of an expected loss function at the population level, computational as well as statistical challenges around expected shortfall regression have led to stimulating research. We discuss some recent developments in this area, with a focus on a new optimization-based semiparametric approach to estimation of conditional expected shortfall that adapts well to data heterogeneity with minimal model assumptions. The talk is based on joint work with Yuanzhi Li and Shushu Zhang.

Bio:

Dr. Xuming He is the Kotzubei-Beckmann Distinguished Professor and Inaugural Chair of Statistics and Data Science at Washington University in St. Louis. He currently serves as President (2023–2025) of the International Statistical Institute (ISI) and as Joint Editor of the Journal of the Royal Statistical Society Series B. His research interests encompass theory and methodology in robust statistics, semiparametric modeling, Bayesian computation and inference, and adaptive sampling and data analysis. Through his interdisciplinary research, He aims to promote the better use of statistics in biosciences, public health, climate studies, and socio-economic research.

He received his PhD in Statistics from University of Illinois at Urbana-Champaign (UIUC). His prior appointments include the H. C. Carver Collegiate Professor of Statistics at the University of Michigan, faculty positions at UIUC and the National University of Singapore, and Program Director of Statistics at National Science Foundation, USA. He is an elected Fellow of the American Association for the Advancement of Science (AAAS), the American Statistical Association (ASA), the Institute of Mathematical Statistics IMS), and an Elected Member of the ISI. His recent honors include the Founders Award (2021) from the ASA, the Distinguished Faculty Achievement Award (2021) as well as a Rackham Distinguished Graduate Mentor Award (2021) from the University of Michigan, and the Carver Medal (2022) from the IMS for his decades-long contributions to the statistics profession.

Thursday 04/24/25, Time: 11:00am – 12:15pm, Spectral Ranking Inferences Based on General Multiway Comparisons

Location: Franz Hall 2258A

Jianqing Fan, Frederick L. Moore ’18 Professor of Finance, Professor of Statistics, Professor of Operations Research and Financial Engineering, Princeton University

Abstract:

This paper studies the performance of the spectral method in the estimation and uncertainty quantification of the unobserved preference scores of compared entities in a general and more realistic setup. Specifically, the comparison graph consists of hyper-edges of possible heterogeneous sizes, and the number of comparisons can be as low as one for a given hyper-edge. Such a setting is pervasive in real applications, circumventing the need to specify the graph randomness and the restrictive homogeneous sampling assumption imposed in the commonly used Bradley-Terry-Luce (BTL) or Plackett-Luce (PL) models. Furthermore, in scenarios where the BTL or PL models are appropriate, we unravel the relationship between the spectral estimator and the Maximum Likelihood Estimator (MLE). We discover that a two-step spectral method, where we apply the optimal weighting estimated from the equal weighting vanilla spectral method, can achieve the same asymptotic efficiency as the MLE. Given the asymptotic distributions of the estimated preference scores, we also introduce a comprehensive framework to carry out both one-sample and two-sample ranking inferences, applicable to both fixed and random graph settings. It is noteworthy that this is the first time effective two-sample rank testing methods have been proposed. Finally, we substantiate our findings via comprehensive numerical simulations and subsequently apply our developed methodologies to perform statistical inferences for statistical journals and movie rankings.

(Joint work with Zhipeng Lou, Weichen Wang, and Mengxin Yu)

Bio:

Jianqing Fan is Frederick L. Moore Professor of Finance, Professor of Operations Research and Financial Engineering, and Former Chairman of Department of Operations Research and Financial Engineering at Princeton University, where he directs both financial econometrics and statistics labs. After receiving his Ph.D. from the University of California at Berkeley, he was appointed as assistant, associate, and full professor at the University of North Carolina at Chapel Hill (1989-2003), professor at the University of California at Los Angeles (1997-2000), professor and chair at Chinese University of Hong Kong (2000-2003), and professor at the Princeton University (2003–). He was the past president of the Institute of Mathematical Statistics and the International Chinese Statistical Association. He is the joint editor of Journal of American Statistical Association and was the co-editor of The Annals of Statistics, Probability Theory and Related Fields, Econometrics Journal, Journal of Econometrics, and Journal of Business and Economics Statistics. His published work on statistics, machine learning, economics, finance, and computational biology has been recognized by The 2000 COPSS Presidents’ Award, The 2007 Morningside Gold Medal of Applied Mathematics, Guggenheim Fellow in 2009, P.L. Hsu Prize in 2013, Royal Statistical Society Guy medal in silver in 2014, Noether Distinguished Scholar Award in 2018, Le Cam Award and Lecture in 2021, Frontiers of Science Award in 2024, and Wald Award and Lecture in 2025, and election to Academician of Academia Sinica and member of Royal Academy of Belgium, and follow of American Associations for Advancement of Science, Institute of Mathematical Statistics, American Statistical Association, and Society of Financial Econometrics. His research interests include high-dimensional statistics, data science, machine learning, mathematics of AI, financial economics, and computational biology. He coauthored 4 books and published over 300 highly cited papers.

Thursday 04/17/25, Time: 11:00am – 12:15pm, Fusion Learning: Fusing Inferences from Diverse Data Sources

Location: Franz Hall 2258A

Regina Liu, Distinguished Professor
Department of Statistics, Rutgers University

Abstract:

Advanced data acquisition technology has greatly increased the accessibility of complex inferences, based on summary statistics or sample data, from diverse data sources. Fusion learning refers to combining complex inferences from multiple sources to
make a more effective overall inference for the parameters of interest. We focus on the tasks: 1) Whether/When to combine inferences? 2) How to combine inferences efficiently? 3) How to combine inferences to enhance an individual study, thus named i-Fusion?

We present a general framework for nonparametric and efficient fusion learning. The main tool underlying this framework is the new notion of depth confidence distribution (depth-CD), developed by combining data depth, bootstrap and confidence distributions. We show that a depth-CD is an omnibus form of confidence regions, whose contours of level sets shrink toward the true parameter value, and thus an all-encompassing inferential tool. The approach is efficient, general and robust, and readily applies to heterogeneous studies with a broad range of complex settings. The approach is demonstrated with an aviation safety analysis application in tracking aircraft landing performance and a zero-event studies in clinical trials with non-estimable parameters.

This is joint work with Dungang Liu (U. Cincinnati) and Minge Xie (Rutgers University).

Bio:

Regina Liu is Distinguished Professor, Rutgers University. Her research areas include data depth, resampling, nonparametric statistics, confidence distribution, and fusion learning. Aside from theoretical and methodological research, she has long collaborated with the FAA on aviation safety research projects on process control, text mining and risk management.

She is an elected fellow of the Institute of Mathematical Statistics (IMS) and the American Statistical Association (ASA). Among other distinctions, she is the recipient of 2021 Noether Distinguished Scholar Award (ASA), 2024 Elizabeth Scott Award (Committee of Presidents of Statistical Societies (COPSS)), and the 2011 Stieltjes Professorship of Thomas Stieltjes Institute for Mathematics of The Netherlands, and also delivered an IMS Medallion Lecture. She has served as Co-Editor for the Journal of the American Statistical Association and also as Associate Editor for several journals. She was elected President of the Institute of Mathematical Statistics (IMS), 2020-2021.

Thursday 04/10/25, Time: 11:00am – 12:15pm, Statistical models for analyzing large-scale human genomic data

Location: Franz Hall 2258A

Sriram Sankararaman, Professor
Department of Computer Science at UCLA

Abstract:

The quest to understand the interplay between genes and diseases has been revolutionized by Biobanks that collect genetic data paired with traits across millions of individuals in diverse populations. However analyses of these Biobank-scale datasets present substantial statistical and computational challenges.

I will describe how we bring together statistical and computational insights to design accurate and highly scalable algorithms for a suite of problems that arise in the analysis of Biobank-scale data: highly scalable randomized inference algorithms to infer the genetic architecture of complex traits and deep-learning based phenotype imputation to deal with complex patterns of missingness. By applying these methods to about half a million individuals from the UK Biobank, we obtain novel insights into the contributions of additive, dominance, gene-gene and gene-environment interaction effects to trait variation and discover new genetic risk factors for hard-to-measure diseases.

Bio:

Sriram Sankararaman is a professor in the Departments of Computer Science, Human Genetics, and Computational Medicine at UCLA. He develops statistical and computational methods to make sense of complex, high-dimensional datasets in genomics and biomedicine with the broad goal of understanding the interplay between evolution, genomes, and traits. His work has led to the identification of disease genes in human populations; to the discovery of interbreeding between humans and Neanderthals; and guidelines for how genetic data can be shared without compromising privacy. He is a recipient of a NSF Career Award, a NIH Outstanding Investigator Award, and fellowships from Microsoft Research and the Sloan Foundation as well as an Excellence in Teaching Award from the UCLA School of Engineering.

Thursday 04/03/25, Time: 11:00am – 12:15pm, Statistical Neuroimaging Analysis: An Overview

Location: Franz Hall 2258A

Lexin Li, Professor
Department of Biostatistics & Helen Wills Neuroscience Institute at UC Berkeley

Abstract:

Understanding the inner workings of human brains, as well as their connections with neurological disorders, is one of the most intriguing scientific questions. Studies in neuroscience are greatly facilitated by a variety of neuroimaging technologies, including anatomical magnetic resonance imaging (MRI), functional magnetic resonance imaging (fMRI), electroencephalography (EEG), diffusion tensor imaging, positron emission tomography (PET), among many others. The size and complexity of medical imaging data, however, pose numerous challenges, and call for constant development of new statistical methods. In this talk, I give an overview of a range of neuroimaging topics our group has been investigating, including imaging tensor analysis, brain connectivity network analysis, multimodality analysis, and imaging causal analysis. I also illustrate with a number of specific case studies.

Bio:

Lexin Li, Ph.D., is a Professor of Biostatistics at the Department of Biostatistics and Epidemiology, and Helen Wills Neuroscience Institute, of the University of California, Berkeley. His research interests include neuroimaging analysis, deep brain stimulation, brain-computer-interface, reinforcement learning, ordinary differential equations, tensor data analysis, and network data analysis. He is a Fellow of the American Statistical Association (ASA), a Fellow of the Institute of Mathematical Statistics (IMS), and an Elected Member of the International Statistical Institute (ISI). He is the Editor-in-Chief of the Annals of Applied Statistics for 2025-27.

Thursday 03/13/25, Time: 11:00am – 12:15pm, Learning Deep Generative Models from Corrupted Data

Location: Franz 2258A

Oscar Leong, Assistant Professor
UCLA Department of Statistics

Abstract:

Deep generative models have demonstrated remarkable success in high-dimensional, high-resolution natural image synthesis. Their capabilities show promise across a wide range of applications, particularly in inverse problems where reconstruction from partial or noisy measurements is crucial. However, their generation performance heavily relies on access to high-quality training data. This limits their application in scientific domains where uncorrupted, noise-free data is either costly or impossible to obtain. This talk will present two vignettes addressing such challenges. In the first, leveraging tools from variational inference, we introduce a framework to learn deep generative priors directly from noisy measurements by exploiting common, low-dimensional manifold structure shared among the underlying ground-truth images. We demonstrate the effectiveness of this approach in a compressed sensing problem inspired by black-hole imaging. In the second, we present the surprising finding that distillation — a method primarily used to accelerate diffusion models without sacrificing generation quality — can serve as an effective approach for training high-quality generative models from noisy data. Across varying noise levels and datasets, we show how one can distill a degraded diffusion model to obtain a generative model with enhanced sample quality. This perspective reframes distillation as not only a tool for efficiency but also a mechanism for improving generative models, particularly in low-quality data settings.

This work is based on the following two papers: https://arxiv.org/abs/2304.05589 and https://arxiv.org/abs/2503.07578.

Bio:

Oscar Leong is an Assistant Professor in the Department of Statistics and Data Science at the University of California, Los Angeles. He earned his Ph.D. from Rice University in Computational and Applied Mathematics in 2021, where he was an NSF Graduate Research Fellow. Prior to joining UCLA, he was a von Karman Instructor in the Computing and Mathematical Sciences Department at the California Institute of Technology. His research lies at the interface of the mathematics of data science and the theory and application of machine learning for inverse problems.

Thursday 03/06/25, Time: 11:00am – 12:15pm, Point Process Learning: Statistical Learning for Spatial Point Processes

Location: Franz 2258A

Julia Jansson, Ph.D. Student
Chalmers University of Technology and the University of Gothenburg

Abstract:

The i.i.d. assumption is often violated in practice, such as in real-world data for ambulance calls, nerve fibers and earthquakes. One way to model this type of data, which has dependency among the observations, is through point processes. Recently, to extend statistical learning methods to the field of spatial point processes, Point Process Learning (PPL) was introduced. In this talk, I will present PPL, and describe its statistical properties in the context of Gibbs point processes. We show that PPL is a robust competitor to state-of-the-art parameter estimation methods like Takacs-Fiksel estimation and its special case pseudolikelihood.

Bio:

Julia Jansson is a PhD student at the joint mathematics department of Chalmers University of Technology and University of Gothenburg, located in Sweden. Her research focus is spatial point processes, specifically Gibbs point processes, which can be used to model locations of trees in a forest, stars in a galaxy, or the structure of materials. During the winter quarter of 2025, she is a Visiting Graduate Researcher at UCLA, working with Professor Rick Schoenberg’s research group on modeling earthquakes with spatio-temporal point processes.

Thursday 02/27/25, Time: 11:00am – 12:15pm, Computationally Efficient Periodic Changepoint Detection

Location: Franz 2258A

Rebecca Killick, Professor of Statistics
Lancaster University, UK

When considering finer scale environmental data, e.g. daily or sub-daily, we have to model the finer scale periodicities and/or changes that become part of the ‘noise’ to the climate signal we wish to understand. However, it can be hard to disentangle the large scale, climate driven, changes amongst the finer scale changes. However, failure to model the finer behaviour can lead to incorrect inference on the large scale patterns. In addition, the finer scale data has larger, often nonstationary, second order behaviour than its monthly or yearly counterparts. To combat these issues of nonstationary second order structure and multiscale changes we propose a hierarchical circular changepoint approach that separately models the within year (fine scale) changes from the across year (climate related) changes whilst allowing for a nonstationary error structure. We demonstrate the approach on temperature data and an application from digital health monitoring.

Bio:

Rebecca Killick received their PhD degree in Statistics from Lancaster University, where they hold a Professor and Director of Research positions. For 2024/25 Rebecca is also a visiting Professor at UC Santa Cruz. Their primary research interests lie in development of novel methodology for the analysis of univariate and multivariate nonstationary time series models. Rebecca is highly motivated by real world problems and has worked with data in a range of fields including Bioinformatics, Energy, Engineering, Environment, Finance, Health, Linguistics and Official Statistics. Rebecca is a firm advocate for open source software and contributing to the wider statistical community.

Thursday 02/20/25, Time: 11:00am – 12:15pm, Optimal PhiBE — A Model-Free PDE-Based Framework for Continuous-Time Reinforcement Learning

Location: Franz 2258A

Yuhua Zhu, Assistant Professor
Department of Statistics and Data Science , UCLA

This talk addresses continuous-time reinforcement learning (RL) in settings where the system dynamics are governed by a stochastic differential equation but remains unknown, with only discrete-time observations available. While the optimal Bellman equation enables model-free algorithms, its discretization error is significant when the reward function oscillates. Conversely, model-based PDE approaches offer better accuracy but suffer from non-identifiable inverse problems. To bridge this gap, we introduce Optimal-PhiBE, an equation that integrates discrete-time information into a PDE, combining the strengths of both RL and PDE formulations. Compared to the RL formulation, Optimal-PhiBE is less sensitive to reward oscillations, leading to smaller discretization errors. In linear-quadratic control, Optimal-PhiBE can even achieve accurate continuous-time optimal policy with only discrete-time information. Compared to the PDE formulation, it skips the identification of the dynamics and enables model-free algorithm derivation. Furthermore, we extend Optimal-PhiBE to higher orders, providing increasingly accurate approximations. At the end of the talk, I will discuss how this technique can be leveraged to generate time-dependent samples and tackle goal-oriented inverse problems.

Bio:

Yuhua Zhu is an assistant professor in the Department of Statistics and Data Science at UCLA. Previously, she was an assistant professor at UC San Diego, where she held a joint appointment in the Halicioğlu Data Science Institute (HDSI) and the Department of Mathematics. Before that, she was a Postdoctoral Fellow at Stanford University and earned her Ph.D. from UW-Madison. Her work bridges the gap between partial differential equations and machine learning, with a focus on reinforcement learning, stochastic optimization, and uncertainty quantification.

Thursday 02/13/25, Time: 11:00am – 12:15pm, The High-Dimensional Asymptotics of Principal Components Regression

Location: Franz 2258A

Alden Green, Stein Fellow
Department of Statistics, Stanford University

We study principal components regression (PCR) in an asymptotic high-dimensional setting, where the number of data points is proportional to the dimension. We derive exact limiting formulas for estimation and prediction risk, which depend in a complicated manner on the eigenvalues of the population covariance, the alignment between the population PCs and the true signal, and the number of selected PCs. A key challenge in the high-dimensional setting stems from the fact that the sample covariance is an inconsistent estimate of its population counterpart, so that sample PCs may fail to fully capture potential latent low-dimensional structure in the data. We demonstrate this point through several case studies, including that of a spiked covariance model.

Bio:

Alden Green is a Stein Fellow in the Stanford Department of Statistics, where he works on problems related to high-dimensional regression, dimensionality reduction, graph-based nonparametric estimation and hypothesis testing, and selective inference. Previously, he obtained his PhD in Statistics from Carnegie Mellon University, where his thesis was awarded the Umesh K. Gavaskar Memorial Thesis Award. During his PhD, Alden also participated in COVID-19 forecasting efforts as a core member of the DELPHI group.

Tuesday 02/11/25, Time: 11:00am – 12:15pm, Causal Fairness Analysis

Location: Mathematical Sciences 8359

Drago Plecko, Postdoctoral Scholar
Department of Computer Science, Columbia University

In this talk, we discuss the foundations of fairness analysis through the lens of causal inference, also paying attention to how questions of fairness compound with the use of artificial intelligence (AI). In particular, the framework of Causal Fairness Analysis is introduced, which distinguishes three fairness tasks: (i) bias detection, (ii) fair prediction, and (iii) fair decision-making. In bias detection, we demonstrate how commonly used statistical measures of disparity cannot distinguish between causally different explanations of the disparity, and we discuss causal tools that bridge this gap. In fair prediction, we discuss how an automated predictor may inherit bias from human-generated labels, and how this can be formally tested and subsequently mitigated. For the task of fair decision-making, we discuss how human or AI decision-makers design policies for treatment allocation, focusing on how much a specific individual would benefit from treatment, counterfactually speaking, when contrasted with an alternative, no-treatment scenario. We discuss how historically disadvantaged groups may differ in their distribution of covariates and, therefore, their benefit from treatment may differ, possibly leading to disparities in resource allocation. The discussion of each task is accompanied by real-world examples, in an attempt to build a catalog of different fairness settings. We also take a deeper look into applying Causal Fairness Analysis to explain racial and ethnic disparities following admission to an intensive care unit (ICU). Our analysis reveals that minority patients are much more likely to be admitted to the ICU, and that this increase in admission is linked with lack of access to primary care. This leads us to construct the Indigenous Intensive Care Equity (IICE) Radar, a monitoring system for tracking the over-utilization of ICU resources by the Indigenous population of Australia across geographical areas, opening the door for targeted public health interventions aimed at improving health equity.

Related papers:
[1] Plečko, Drago, and Elias Bareinboim. “”Causal fairness analysis: a causal toolkit for fair machine learning.”” Foundations and Trends® in Machine Learning 17.3 (2024): 304-589.
[2] Plecko, Drago, et al. An Algorithmic Approach for Causal Health Equity: A Look at Race Differentials in Intensive Care Unit (ICU) Outcomes. arXiv preprint arXiv:2501.05197 (2025).

Bio:

Drago Plecko is a postdoctoral scholar in the Department of Computer Science at Columbia University, having joined after completing his PhD in Statistics at ETH Zürich. His research focuses on causal inference, and spans several topics in trustworthy data science, including fairness, recourse, and explainability. Drago also has a strong interest in applied problems, particularly in medicine, where he investigated epidemiological questions in intensive care medicine.

Thursday 02/06/25, Time: 11:00am – 12:15pm, A Unified Framework for Efficient Learning at Scale

Location: Franz Hall 2258A

Soufiane Hayou, Postdoctoral Scholar
Simons Institute, UC Berkeley

Abstract:

State-of-the-art performance is usually achieved via a series of modifications to existing neural architectures and their training procedures. A common feature of these networks is their large-scale nature: modern neural networks usually have billions – if not hundreds of billions – of trainable parameters. While empirical evaluations generally support the claim that increasing the scale of neural networks (width, depth, etc) boosts model performance if done correctly, optimizing the training process across different scales remains a significant challenge, and practitioners tend to follow empirical scaling laws from the literature. In this talk, I will present a unified framework for efficient learning at large scale. The framework allows us to derive efficient learning rules that automatically adjust to model scale, ensuring stability and optimal performance. By analyzing the interplay between network architecture, optimization dynamics, and scale, we demonstrate how these theoretically-grounded learning rules can be applied to both pretraining and finetuning. The results offer new insights into the fundamental principles governing neural network scaling and provide practical guidelines for training large-scale models efficiently.

Bio:

Soufiane Hayou is currently a postdoctoral researcher at Simons Institute, UC Berkeley. He was a visiting assistant professor of mathematics at the National University of Singapore for the last 3 years. He obtained his PhD in statistics and machine learning in 2021 from the University of Oxford, and graduated from Ecole Polytechnique in Paris before joining Oxford. His research is mainly focused on the theory and practice of learning at scale: theoretical analysis of large scale neural networks with the goal of obtaining principled methods for training/finetuning. Topics include depth scaling (Stable ResNet), hyperparameter transfer (Depth-muP parametrization), efficient finetuning (LoRA+, a method that improves upon LoRA by setting optimal learning rates for matrices A and B) etc.

Thursday 01/30/25, Time: 11:00am – 12:15pm, Modern Sampling Paradigms: from Posterior Sampling to Generative AI

Location: Franz Hall 2258A

Yuchen Wu, Postdoctoral Researcher
Department of Statistics and Data Science at the Wharton School, University of Pennsylvania

Abstract:

Sampling from a target distribution is a recurring theme in statistics and generative artificial intelligence (AI). In statistics, posterior sampling offers a flexible inferential framework, enabling uncertainty quantification, probabilistic prediction, as well as the estimation of intractable quantities. In generative AI, sampling aims to generate unseen instances that emulate a target population, such as the natural distributions of texts, images, and molecules. In this talk, I will present my works on designing provably efficient sampling algorithms, addressing challenges in both statistics and generative AI. In the first part, I will focus on posterior sampling for Bayes sparse regression. In general, such posteriors are high-dimensional and contain many modes, making them challenging to sample from. To address this, we develop a novel sampling algorithm based on decomposing the target posterior into a log-concave mixture of simple distributions, reducing sampling from a complex distribution to sampling from a tractable log-concave one. We establish provable guarantees for our method in a challenging regime that was previously intractable. In the second part, I will describe a training-free acceleration method for diffusion models, which are deep generative models that underpin cutting-edge applications such as AlphaFold, DALL-E and Sora. Our approach is simple to implement, wraps around any pre-trained diffusion model, and comes with a provable convergence rate that strengthens prior theoretical results. We demonstrate the effectiveness of our method on several real-world image generation tasks. Lastly, I will outline my vision for bridging the fields of statistics and generative AI, exploring how insights from one domain can drive progress in the other.

Bio:

Yuchen Wu is a departmental postdoctoral researcher in the Department of Statistics and Data Science at the Wharton School, University of Pennsylvania. She earned her Ph.D. in 2023 from Stanford University, where she was advised by Professor Andrea Montanari. Her research lies broadly at the intersection of statistics and machine learning, featuring generative AI, high-dimensional statistics, Bayesian inference, algorithm design, and data-driven decision making.

Tuesday 01/28/25, Time: 11:00am – 12:15pm, Policy Evaluation in Dynamic Experiments

Location: Mathematical Sciences 8359

Yuchen Hu, Ph.D. Student
Management Science and Engineering, Stanford University

Abstract:

Experiments where treatment assignment varies over time, such as micro-randomized trials and switchback experiments, are essential for guiding dynamic decisions. These experiments often exhibit nonstationarity due to factors like hidden states or unstable environments, posing substantial challenges for accurate policy evaluation. In this talk, I will discuss how Partially Observed Markov Decision Processes (POMDPs) with explicit mixing assumptions provide a natural framework for modeling dynamic experiments and can guide both the design and analysis of these experiments. In the first part of the talk, I will discuss properties of switchback experiments in finite-population, nonstationary dynamic systems. We find that, in this setting, standard switchback designs suffer considerably from carryover bias, but judicious use of burn-in periods can considerably improve the situation and enable errors that decay nearly at the parametric rate. In the second part of the talk, I will discuss policy evaluation in micro-randomized experiments and provide further theoretical grounding on mixing-based policy evaluation methodologies. Under a sequential ignorability assumption, we provide rate-matching upper and lower bounds that sharply characterize the hardness of off-policy evaluation in POMDPs. These findings demonstrate the promise of using stochastic modeling techniques to enhance tools for causal inference. Our formal results are mirrored in empirical evaluations using ride-sharing and mobile health simulators.

Bio:

Yuchen Hu is a Ph.D. candidate in Management Science and Engineering at Stanford University, under the supervision of Professor Stefan Wager. Her research focuses on causal inference, data-driven decision making, and stochastic processes. She is particularly interested in developing interdisciplinary statistical methodologies that enhance the applicability, robustness, and efficiency of data-driven decisions in complex environments. Hu holds an M.S. in Biostatistics from Harvard University and a B.Sc. in Applied Mathematics from Hong Kong Polytechnic University.

Thursday 01/23/25, Time: 11:00am – 12:15pm, Transfer and Multi-task Learning: Statistical Insights for Modern Data Challenges

Location: Franz Hall 2258A

Ye Tian, Ph.D. Student
Department of Statistics, Columbia University

Abstract:

Knowledge transfer, a core human ability, has inspired numerous data integration methods in machine learning and statistics. However, data integration faces significant challenges: (1) unknown similarity between data sources; (2) data contamination; (3) high-dimensionality; and (4) privacy constraints. This talk addresses these challenges in three parts across different contexts, presenting both innovative statistical methodologies and theoretical insights. In Part I, I will introduce a transfer learning framework for high-dimensional generalized linear models that combines a pre-trained Lasso with a fine-tuning step. We provide theoretical guarantees for both estimation and inference, and apply the methods to predict county-level outcomes of the 2020 U.S. presidential election, uncovering valuable insights. In Part II, I will explore an unsupervised learning setting where task-specific data is generated from a mixture model with heterogeneous mixture proportions. This complements the supervised learning setting discussed in Part I, addressing scenarios where labeled data is unavailable. We propose a federated gradient EM algorithm that is communication-efficient and privacy-preserving, providing estimation error bounds for the mixture model parameters. In Part III, I will introduce a representation-based multi-task learning framework that generalizes the distance-based similarity notion discussed in Parts I and II. This framework is closely related to modern applications of fine-tuning in image classification and natural language processing. I will discuss how this study enhances our understanding of the effectiveness of fine-tuning and the influence of data contamination on representation multi-task learning. Finally, I will summarize the talk and briefly introduce my broader research interests. The three main sections of this talk are based on a series of papers [TF23, TWXF22, TWF24, TGF23] and a short course I co-taught at NESS 2024 [STL24]. More about me and my research can be found at https://yet123.com.

[TF23] Tian, Y., & Feng, Y. (2023). Transfer Learning under High-dimensional Generalized Linear Models. Journal of the American Statistical Association, 118(544), 2684-2697.
[TWXF22] Tian, Y., Weng, H., Xia, L., & Feng, Y. (2022). Unsupervised Multi-task and Transfer Learning on Gaussian Mixture Models. arXiv preprint arXiv:2209.15224.
[TWF24] Tian, Y., Weng, H., & Feng, Y. (2024). Towards the Theory of Unsupervised Federated Learning: Non-asymptotic Analysis of Federated EM Algorithms. ICML 2024.
[TGF23] Tian, Y., Gu, Y., & Feng, Y. (2023). Learning from Similar Linear Representations: Adaptivity, Minimaxity, and Robustness. arXiv preprint arXiv:2303.17765.
[STL24] A (Selective) Introduction to the Statistics Foundations of Transfer Learning. (2024).

Bio:

Ye Tian is a final-year Ph.D. student in Statistics at Columbia University. His research lies at the intersection of statistics, data science, and machine learning, focusing on three main topics: (1) reliable transfer learning; (2) high-dimensional statistics; and (3) privacy and fairness of the learning system.

Thursday 01/16/25, Time: 11:00am – 12:15pm, Scientific Machine Learning in the New Era of AI: Foundations, Visualization, and Reasoning

Location: Online

Wuyang Chen, Assistant Professor
Computing Science, Simon Fraser University

Abstract:

The rapid advancements in artificial intelligence (AI), propelled by data-centric scaling laws, have significantly transformed our understanding and generation of both vision and language. However, natural media, such as images, videos, and languages, represent only a fraction of the modalities we encounter, leaving much of the physical world underexplored. We propose that Scientific Machine Learning (SciML) offers a knowledge-driven framework that complements data-driven AI, enabling us to better understand, visualize, and interact with the diverse complexities of the physical world. In this talk, we will delve into the cutting-edge intersection of AI and SciML. First, we will discuss the automation of scientific analysis through multi-step reasoning grounded with formal languages, paving the way for more advanced control and interactions in scientific models. Second, we will demonstrate how SciML can streamline the visualization of intricate geometries, while also showing how spatial intelligence can be adapted for more robust SciML modeling. Finally, we will explore how scaling scientific data can train foundation models that integrate multiphysics knowledge, thereby enhancing traditional simulations with a deeper understanding of physical principles.

Bio:

Dr. Wuyang Chen is a tenure-track Assistant Professor in Computing Science at Simon Fraser University. Previously, he was a postdoctoral researcher in Statistics at the University of California, Berkeley. He obtained his Ph.D. in Electrical and Computer Engineering from the University of Texas at Austin in 2023. Dr. Chen’s research focuses on scientific machine learning, theoretical understanding of deep networks, and related applications in foundation models, computer vision, and AutoML. He also works on domain adaptation/generalization and self-supervised learning. Dr. Chen has published papers at CVPR, ECCV, ICLR, ICML, NeurIPS, and other top conferences. Dr. Chen’s research has been recognized by NSF (National Science Foundation) newsletter in 2022, INNS Doctoral Dissertation Award and the iSchools Doctoral Dissertation Award in 2024, and AAAI New Faculty Highlights in 2025. Dr. Chen is the host of the Foundation Models for Science workshop at NeurIPS 2024 and co-organized the 4th and 5th versions of the UG2+ workshop and challenge at CVPR in 2021 and 2022. He also serves on the board of the One World Seminar Series on the Mathematics of Machine Learning.

Thursday 01/09/25, Time: 11:00am – 12:15pm, PDE-based model-free algorithm for Continuous-time Reinforcement Learning

Location: Franz Hall 2258A

Yuhua Zhu, Assistant Professor
UCLA Department of Statistics and Data Science

Abstract:

This talk addresses the problem of continuous-time reinforcement learning (RL) in scenarios where the dynamics follow a stochastic differential equation. When the underlying dynamics remain unknown and only discrete-time observations are available, how can we effectively conduct policy evaluation? We first highlight that while model-free RL algorithms are straightforward to implement, they are often not a reliable approximation of the true value function. On the other hand, model-based PDE approaches are more accurate, but the inverse problem is not easy to solve. To bridge this gap, we introduce a new Bellman equation, PhiBE, which integrates discrete-time information into a PDE formulation. PhiBE allows us to skip the identification of the dynamics and directly evaluate the value function using discrete-time data. Additionally, it offers a more accurate approximation of the true value function, especially in scenarios where the underlying dynamics change slowly. Moreover, we extend PhiBE to higher orders, providing increasingly accurate approximations.

Bio:

Yuhua Zhu is an assistant professor in the Department of Statistics and Data Science at UCLA. Previously, she was an assistant professor at UC San Diego, where she held a joint appointment in the Halicioğlu Data Science Institute (HDSI) and the Department of Mathematics. Before that, she was a Postdoctoral Fellow at Stanford University and earned her Ph.D. from UW-Madison. Her work bridges the gap between partial differential equations and machine learning, with a focus on reinforcement learning, stochastic optimization, and uncertainty quantification.

Thursday 12/05/24, Time: 11:00am – 12:15pm, Accelerating, Enhancing, and Securing Deep Generative Models with Score Identity Distillation

Location: Haines A25

Mingyuan Zhou, Associate Professor
Statistics Group, McCombs School of Business, University of Texas at Austin

Abstract:

Diffusion models, renowned for their photorealistic generation capabilities, face significant challenges, including slow generation speeds and the risk of producing inappropriate content. In this talk, I will begin with an overview of deep generative models before introducing a paradigm shift in utilizing pretrained diffusion models for data generation. Instead of relying on these models to reverse the diffusion process through iterative refinement, I will demonstrate the exceptional efficacy and versatility of distilling a one-step generator using the framework of Score identity Distillation (SiD).

Challenging the conventional belief that high-quality diffusion-based generation requires iterative refinement, SiD demonstrates that a bias-corrected estimation of the gradient of a model-based Fisher divergence is all that is needed to distill a high-performing single-step generator from a teacher model. This data-free approach not only accelerates generation but also often surpasses the quality of the teacher models, which typically depend on dozens to hundreds of iterative steps.
SiD further enhances generative performance by incorporating training data to enable joint distillation and adversarial generation, resulting in substantial improvements over its teacher models. Moreover, safeguards can be seamlessly integrated into SiD to selectively forget unsafe concepts, such as nudity and personal identities, promoting safer and more ethical content generation. These advancements establish SiD as a robust and versatile framework for high-quality, efficient, and secure generative AI, opening new avenues for groundbreaking research and practical applications.

Bio:

Mingyuan Zhou is an Associate Professor at the University of Texas at Austin, specializing in machine learning and probabilistic modeling. He earned his Ph.D. from Duke University in 2013. His research spans generative models, statistical inference for big data, deep learning, and reinforcement learning. Currently, he focuses on advancing generative AI technologies to improve their efficiency, speed, and safety. Recent examples of his research group’s contributions include Diffusion-QL, Diffusion-GAN, Beta Diffusion, and Score identity Distillation with its variations. He also serves as an Action Editor for the Journal of Machine Learning Research.

Thursday 11/21/24, Time: 11:00am – 12:15pm, A Panel Discussion with Professor Nancy Reid

Location: Haines A25

Nancy Reid, Professor
Department of Statistical Science, University of Toronto

Abstract:

A panel discussion with Professor Nancy Reid will take place.

Bio:

Nancy Reid is University Professor in the Department of Statistical Sciences at the University of Toronto. Her research interests include statistical theory, likelihood inference, design of studies, and statistical science in public policy. She has held many professional leadership roles in statistical science, in Canada and abroad. Nancy studied at the University of Waterloo (B.Math. 1974), the University of British Columbia (M.Sc. 1976), Stanford University (PhD 1979) and Imperial College, London (PDF 1980). She joined the University of Toronto in 1986 from the University of British Columbia. Nancy is a Fellow of the Royal Society, the Royal Society of Canada, the American Association for the Advancement of Science, and a Foreign Associate of the National Academy of Sciences. In 2015 she was appointed Officer of the Order of Canada. In 2023 she received the David R. Cox Foundations of Statistics Award, “for contributions to the foundations of statistics that significantly advanced the frontiers of statistics and for insight that transformed understanding of parametric statistical inference”.

Wednesday 11/20/24, Time: 3:30pm – 4:30pm, When Likelihood Goes Wrong

Location: CHS 43105A

Joint Seminar hosted by UCLA Biostatistics

Nancy Reid, Professor
Department of Statistical Science, University of Toronto

Abstract:

Inference based on the likelihood function is the workhorse of statistics, and constructing the likelihood function is often the first step in any detailed analysis, even for very complex data. At the same time, statistical theory tells us that ‘black-box’ use of likelihood inference can be very sensitive to the dimension of the parameter space, the structure of the parameter space, and any measurement error in the data. This has been recognized for a long time, and many alternative approaches have been suggested with a view to preserving some of the virtues of likelihood inference while ameliorating some of the difficulties. In this talk I will discuss some of the ways that likelihood inference can go wrong, and some of the potential remedies, with particular emphasis on model misspecification.

Bio:

Nancy Reid is University Professor in the Department of Statistical Sciences at the University of Toronto. Her research interests include statistical theory, likelihood inference, design of studies, and statistical science in public policy. She has held many professional leadership roles in statistical science, in Canada and abroad. Nancy studied at the University of Waterloo (B.Math. 1974), the University of British Columbia (M.Sc. 1976), Stanford University (PhD 1979) and Imperial College, London (PDF 1980). She joined the University of Toronto in 1986 from the University of British Columbia. Nancy is a Fellow of the Royal Society, the Royal Society of Canada, the American Association for the Advancement of Science, and a Foreign Associate of the National Academy of Sciences. In 2015 she was appointed Officer of the Order of Canada. In 2023 she received the David R. Cox Foundations of Statistics Award, “for contributions to the foundations of statistics that significantly advanced the frontiers of statistics and for insight that transformed understanding of parametric statistical inference”.

Thursday 11/14/24, Time: 11am-12:15pm, Advancing Statistical Rigor in Educational Assessment and Single-Cell Omics Using In Silico Control Data

Location: Haines Hall A25

Guanao Yan, Ph.D. Student
UCLA Department of Statistics and Data Science

Abstract:

In this talk, I will explore how in silico control data can be used to enhance statistical rigor in two distinct fields: educational assessment and single-cell omics data analysis. First, I will discuss the application of in silico data in educational contexts to promote fairness in assessment. Specifically, I will highlight my work on developing statistical tools to detect patterns of collusion in online exams. By incorporating in silico data as negative controls, we can quantify errors—such as false positives—ensuring that educational assessments accurately reflect true student performance. Next, I will address challenges in single-cell data analysis, particularly the complexity of selecting the right tool from over 1,700 available computational methods. One promising solution is the generation and use of synthetic data as positive controls. This approach establishes trustworthy evaluation standards, enabling more accurate method comparisons and providing rigorous evidence to advance our understanding of cellular biology.

Bio:

Guanao Yan is a fifth-year PhD student in the Department of Statistics and Data Science at the University of California, Los Angeles (UCLA), where he is supervised by Dr. Jingyi Jessica Li. His research focuses on developing novel statistical methods to address real-world data challenges, spanning multiple fields. In statistical bioinformatics, he specializes in analyzing single-cell and spatial omics data, with a particular emphasis on using synthetic data to improve the statistical rigor of these analyses. His work also extends to general statistical methodologies, including high-dimensional model inference and variable selection, as well as to education, where he develops statistical methods to promote equity by ensuring fair and accurate assessments in educational systems.

Thursday, 11/07/2024, Time: 11:00am-12:15pm, Selecting Informative Subdata from a Large Dataset

Location: Haines Hall A25

John Stufken, Professor
Department of Statistics, George Mason University

Abstract:

Exploration or inference for large datasets can be computationally expensive. Depending on the objectives, it is often possible to obtain reliable results without using all of the observations. This has resulted in the development of methods for selecting some of the observations from a large dataset, either through a stochastic or deterministic approach, and draw conclusions based on the selected data only. Ideally, such selection methods are robust to model misspecification and objectives (e.g., exploration, estimation, prediction) and are computationally efficient. I will present a selective overview of methods developed during the last years for selecting informative subdata.

Bio:

John Stufken is Professor in the Department of Statistics at George Mason University, where he also serves as Associate Chair for Research. Prior to this he was Bank of America Excellence Professor and Director for Informatics and Analytics at UNC Greensboro (2019-2022), Charles Wexler Endowed Professor and Coordinator for Statistics at Arizona State University (2014-2019), Head of the Department of Statistics at the University of Georgia (2003-2014), Program Director for Statistics at the National Science Foundation (2000-2003), Assistant, Associate and Full Professor in the Department of Statistics at Iowa State University (1988-2002), and Assistant and Associate Professor in the Department of Statistics at the University of Georgia (1986-1990). Stufken’s research interests are in design and analysis of experiments and subsampling of big data. He currently serves as co-Editor for Statistica Sinica (2023-2026), and has in the past been Editor for The American Statistician (2009-2011) and the Journal of Statistical Planning and Inference (2004-2006). He is an Elected Fellow of the ASA and IMS, an Elected Member of the ISI, and served as the Rothschild Distinguished Visiting Fellow at the Isaac Newton Institute for Mathematical Sciences during the 2011 workshop on Design and Analysis of Experiments.

Thursday, 10/31/2024, Time: 11:00am-12:15pm, How to be a good and a bad Statistician: Ethical considerations in Model Construction?

Location: Haines Hall A25

Mahtash Esfandiari, Senior Continuing Lecturer
UCLA Department of Statistics and Data Science

Ariana Anderson, Assistant Professor-in-Residence
Psychiatry and Biobehavioral Science, UCLA School of Medicine

Wenhong Sun, Senior
UCLA Department of Statistics and Data Science

Abstract:

In this presentation, we will: 1) present a short description of the ethical guidelines of the American Statistical Association, and Royal Statistics Society, 2) introduce four major ethical consideration in machine learning research proposed by Toms and Whithworth that resulted from extensive literature review, collaboration with national statistics institutes, and feedback from stakeholders, and 3) Provide real world examples in which these four major ethical considerations are jeopardized.

Then, we introduce of a case study in which multiple teams of Statistics and Data Science seniors enrolled in our Capstone course (Introduction to the Practice of Statistical Consulting) implemented different “machine learning” methods to analyze baby cry data from Kaggle playing the role of good and bad statistician with Mahtash Esfandiari as their instructor, Dr. Ariana Anderson as the client, and Wenhong Sun as one of the team members who participated in the project.

Finally: 1) we will present a meta-analysis related to the accuracy of predictive models created by teams playing good and bad statistician, and 2) discuss which of the four major ethical guidelines discussed could be potentially jeopardized in predicting the type of baby cry using the Kaggle data.

Bio:

Dr. Mahtash Esfandiari is full time faculty in the Department of Statistics and Data Science at UCLA. She obtained her PhD in cognitive science and applied statistics from the University of Illinois in Urbana Champaign. She has extensive consulting experience in medical sciences, industry, and evaluation of interventions. Her research interests include application of cognitive science to enhancement of learning and teaching of statistics, diversity, and emotional intelligence (EQ) as a major component of a successful statistical consulting. Over her career Dr. Esfandiari has obtained several research grants from the National Science Foundation and College of Letters and Science at UCLA and did innovative work in statistics education, diversity, and statistical consulting.

Dr. Ariana Anderson is an Assistant Professor in the Departments of Psychiatry and Biobehavioral Sciences and Statistics at the University of California, Los Angeles (UCLA). She earned her Bachelor’s degree in Mathematics, followed by a Master’s and Doctorate in Statistics, and completed postdoctoral training in neuroimaging and psychiatry at UCLA’s Department of Psychiatry and Biobehavioral Sciences. For over 13 years, she has been developing new statistical methodologies to enhance the analysis of clinical and brain imaging data. Dr. Anderson’s research focuses on mapping structural and functional brain changes related to pathological processes, utilizing both unsupervised and supervised MRI-based neuroimaging statistical techniques. She serves as the Principal Investigator on multiple NIH-funded brain-imaging studies that explore how changes in brain structure and function due to disease contribute to cognitive decline.

Wenhong Sun is a senior in the Department of Statistics and Data Science at UCLA.

Thursday, 10/17/2024, Time: 11:00am-12:15pm, Wasserstein Regression of Covariance Matrix on Vector Covariates for Single Cell Gene Co-expression Analysis

Location: Haines Hall A25

Hongzhe Li, Professor
Biostatistics, Epidemiology and Informatics, Perelman School of Medicine at the University of Pennsylvania

Abstract:

Population-level single-cell gene expression data captures the gene expressions of thousands of cells from each individual within a sizable cohort. This data enables the construction of cell-type- and individual-specific gene co-expression networks by estimating the covariance matrices. Understanding how such co-expression networks are associated with individual-level covariates is crucial. This paper considers Fréchet regression with the covariance matrix as the outcome and vector covariates, using the Wasserstein distance between covariance matrices as a substitute for the Euclidean distance. A test statistic is defined based on the Fréchet mean and covariate-weighted Fréchet mean. The asymptotic distribution of the test statistic is derived under the assumption of simultaneously diagonalizable covariance matrices. Results from an analysis of large-scale single-cell data reveal an association between the co-expression network of genes in the nutrient sensing pathway and age, indicating a perturbation in gene co-expression networks with aging. More general Fréchet regression on the Bures-Wasserstein manifold will also be discussed and applied to the same single-cell RNA-seq data.

Biography:

Dr. Hongzhe Li is Perelman Professor of Biostatistics, Epidemiology and Informatics at the Perelman School of Medicine at the University of Pennsylvania. He is Vice Chair for Research Integration, Director of Center of Statistics in Big Data and former Chair of the Graduate Program in Biostatistics at Penn. He is also a Professor of Statistics and Data Science at the Wharton School and a Professor of Applied Mathematics and Computational Science at Penn. Dr. Li has been elected as a Fellow of the American Statistical Association (ASA), a Fellow of the Institute of Mathematical Statistics (IMS) and a Fellow of American Association for the Advancement of Science (AAAS). Dr. Li served on the Board of Scientific Counselors of the National Cancer Institute of NIH and regularly serves on various NIH study sections. He served as Chair of the Section on Statistics in Genomics and Genetics of the ASA and Co-Editor-in-Chief of Statistics in Biosciences. Dr. Li’s research focuses on developing statistical and computational methods for analysis of large-scale genetic, genomics and metagenomics data and theory on high dimensional statistics. He has trained over 50 PhD students and postdoctoral fellows.

Thursday, 10/10/2024, Time: 11:00am-12:15pm, A power law Hawkes process modelling earthquake occurrences

Location: Haines Hall A25

Boris Baeumer, Professor
Department of Mathematics and Statistics, University of Otago

Abstract:

In order to capture the increased frequency of earth quakes (aftershocks) after a large event we use a Hawkes process model based on the first relaxation eigenmode of a visco-elastic plate model; i.e. we assume the kernel functions of the Hawkes model are Mittag-Leffler functions. Assuming magnitude and frequency being separable leads to a model that for most data bases outperforms Ogata’s ETAS model predicting earthquake frequency. Removing the restrictive assumption that magnitude and frequency are separable, in order to obtain a parsimonious model of the joint process we need to model the impact of an earthquake of a given magnitude on the intensity measures of all earthquakes. We use marked multivariate Hawkes processes to inform the shape of a parsimonious kernel.

Biography:

Boris Baeumer is a Professor in the Department of Mathematics and Statistics at University of Otago, New Zealand. He obtained his PhD in Mathematics from Louisiana State University. His research interests include non-local PDEs and associated stochastic processes. Over his career he obtained several major research grants and is currently (co-)principal investigator researching “Modelling the domino effect in complex systems”.

Thursday, 10/03/2024, Time: 11:00am-12:15pm, Causal inference in network experiments: regression-based analysis and design-based properties

Location: Haines Hall A25

Peng Ding, Associate Professor
Department of Statistics, UC Berkeley

Abstract:

Investigating interference or spillover effects among units is a central task in many social science problems. Network experiments are powerful tools for this task, which avoids endogeneity by randomly assigning treatments to units over networks. However, it is non-trivial to analyze network experiments properly without imposing strong modeling assumptions. Previously, many researchers have proposed sophisticated point estimators and standard errors for causal effects under network experiments. We further show that regression-based point estimators and standard errors can have strong theoretical guarantees if the regression functions and robust standard errors are carefully specified to accommodate the interference patterns under network experiments. We first recall a well-known result that the Hajek estimator is numerically identical to the coefficient from the weighted-least-squares fit based on the inverse probability of the exposure mapping. Moreover, we demonstrate that the regression-based approach offers three notable advantages: its ease of implementation, the ability to derive standard errors through the same weighted-least-squares fit, and the capacity to integrate covariates into the analysis, thereby enhancing estimation efficiency. Furthermore, we analyze the asymptotic bias of the regression-based network-robust standard errors. Recognizing that the covariance estimator can be anti-conservative, we propose an adjusted covariance estimator to improve the empirical coverage rates. Although we focus on regression-based point estimators and standard errors, our theory holds under the design-based framework, which assumes that the randomness comes solely from the design of network experiments and allows for arbitrary misspecification of the regression models.

Biography:

Peng Ding is an Associate Professor in the Department of Statistics at UC Berkeley. He obtained his Ph.D. from the Department of Statistics, Harvard University in May 2015, and worked as a postdoctoral researcher in the Department of Epidemiology, Harvard T. H. Chan School of Public Health until December 2015.