2024 – 2025 Acad. Year

Thursday 03/06/25, Time: 11:00am – 12:15pm, Point Process Learning: Statistical Learning for Spatial Point Processes

Location: Franz 2258A

Julia Jansson, Ph.D. Student
Chalmers University of Technology and the University of Gothenburg

Abstract:

The i.i.d. assumption is often violated in practice, such as in real-world data for ambulance calls, nerve fibers and earthquakes. One way to model this type of data, which has dependency among the observations, is through point processes. Recently, to extend statistical learning methods to the field of spatial point processes, Point Process Learning (PPL) was introduced. In this talk, I will present PPL, and describe its statistical properties in the context of Gibbs point processes. We show that PPL is a robust competitor to state-of-the-art parameter estimation methods like Takacs-Fiksel estimation and its special case pseudolikelihood.

Bio:

Julia Jansson is a PhD student at the joint mathematics department of Chalmers University of Technology and University of Gothenburg, located in Sweden. Her research focus is spatial point processes, specifically Gibbs point processes, which can be used to model locations of trees in a forest, stars in a galaxy, or the structure of materials. During the winter quarter of 2025, she is a Visiting Graduate Researcher at UCLA, working with Professor Rick Schoenberg’s research group on modeling earthquakes with spatio-temporal point processes.

Thursday 02/27/25, Time: 11:00am – 12:15pm, Computationally Efficient Periodic Changepoint Detection

Location: Franz 2258A

Rebecca Killick, Professor of Statistics
Lancaster University, UK

When considering finer scale environmental data, e.g. daily or sub-daily, we have to model the finer scale periodicities and/or changes that become part of the ‘noise’ to the climate signal we wish to understand. However, it can be hard to disentangle the large scale, climate driven, changes amongst the finer scale changes. However, failure to model the finer behaviour can lead to incorrect inference on the large scale patterns. In addition, the finer scale data has larger, often nonstationary, second order behaviour than its monthly or yearly counterparts. To combat these issues of nonstationary second order structure and multiscale changes we propose a hierarchical circular changepoint approach that separately models the within year (fine scale) changes from the across year (climate related) changes whilst allowing for a nonstationary error structure. We demonstrate the approach on temperature data and an application from digital health monitoring.

Bio:

Rebecca Killick received their PhD degree in Statistics from Lancaster University, where they hold a Professor and Director of Research positions. For 2024/25 Rebecca is also a visiting Professor at UC Santa Cruz. Their primary research interests lie in development of novel methodology for the analysis of univariate and multivariate nonstationary time series models. Rebecca is highly motivated by real world problems and has worked with data in a range of fields including Bioinformatics, Energy, Engineering, Environment, Finance, Health, Linguistics and Official Statistics. Rebecca is a firm advocate for open source software and contributing to the wider statistical community.

Thursday 02/20/25, Time: 11:00am – 12:15pm, Optimal PhiBE — A Model-Free PDE-Based Framework for Continuous-Time Reinforcement Learning

Location: Franz 2258A

Yuhua Zhu, Assistant Professor
Department of Statistics and Data Science , UCLA

This talk addresses continuous-time reinforcement learning (RL) in settings where the system dynamics are governed by a stochastic differential equation but remains unknown, with only discrete-time observations available. While the optimal Bellman equation enables model-free algorithms, its discretization error is significant when the reward function oscillates. Conversely, model-based PDE approaches offer better accuracy but suffer from non-identifiable inverse problems. To bridge this gap, we introduce Optimal-PhiBE, an equation that integrates discrete-time information into a PDE, combining the strengths of both RL and PDE formulations. Compared to the RL formulation, Optimal-PhiBE is less sensitive to reward oscillations, leading to smaller discretization errors. In linear-quadratic control, Optimal-PhiBE can even achieve accurate continuous-time optimal policy with only discrete-time information. Compared to the PDE formulation, it skips the identification of the dynamics and enables model-free algorithm derivation. Furthermore, we extend Optimal-PhiBE to higher orders, providing increasingly accurate approximations. At the end of the talk, I will discuss how this technique can be leveraged to generate time-dependent samples and tackle goal-oriented inverse problems.

Bio:

Yuhua Zhu is an assistant professor in the Department of Statistics and Data Science at UCLA. Previously, she was an assistant professor at UC San Diego, where she held a joint appointment in the Halicioğlu Data Science Institute (HDSI) and the Department of Mathematics. Before that, she was a Postdoctoral Fellow at Stanford University and earned her Ph.D. from UW-Madison. Her work bridges the gap between partial differential equations and machine learning, with a focus on reinforcement learning, stochastic optimization, and uncertainty quantification.

Thursday 02/13/25, Time: 11:00am – 12:15pm, The High-Dimensional Asymptotics of Principal Components Regression

Location: Franz 2258A

Alden Green, Stein Fellow
Department of Statistics, Stanford University

We study principal components regression (PCR) in an asymptotic high-dimensional setting, where the number of data points is proportional to the dimension. We derive exact limiting formulas for estimation and prediction risk, which depend in a complicated manner on the eigenvalues of the population covariance, the alignment between the population PCs and the true signal, and the number of selected PCs. A key challenge in the high-dimensional setting stems from the fact that the sample covariance is an inconsistent estimate of its population counterpart, so that sample PCs may fail to fully capture potential latent low-dimensional structure in the data. We demonstrate this point through several case studies, including that of a spiked covariance model.

Bio:

Alden Green is a Stein Fellow in the Stanford Department of Statistics, where he works on problems related to high-dimensional regression, dimensionality reduction, graph-based nonparametric estimation and hypothesis testing, and selective inference. Previously, he obtained his PhD in Statistics from Carnegie Mellon University, where his thesis was awarded the Umesh K. Gavaskar Memorial Thesis Award. During his PhD, Alden also participated in COVID-19 forecasting efforts as a core member of the DELPHI group.

Tuesday 02/11/25, Time: 11:00am – 12:15pm, Causal Fairness Analysis

Location: Mathematical Sciences 8359

Drago Plecko, Postdoctoral Scholar
Department of Computer Science, Columbia University

In this talk, we discuss the foundations of fairness analysis through the lens of causal inference, also paying attention to how questions of fairness compound with the use of artificial intelligence (AI). In particular, the framework of Causal Fairness Analysis is introduced, which distinguishes three fairness tasks: (i) bias detection, (ii) fair prediction, and (iii) fair decision-making. In bias detection, we demonstrate how commonly used statistical measures of disparity cannot distinguish between causally different explanations of the disparity, and we discuss causal tools that bridge this gap. In fair prediction, we discuss how an automated predictor may inherit bias from human-generated labels, and how this can be formally tested and subsequently mitigated. For the task of fair decision-making, we discuss how human or AI decision-makers design policies for treatment allocation, focusing on how much a specific individual would benefit from treatment, counterfactually speaking, when contrasted with an alternative, no-treatment scenario. We discuss how historically disadvantaged groups may differ in their distribution of covariates and, therefore, their benefit from treatment may differ, possibly leading to disparities in resource allocation. The discussion of each task is accompanied by real-world examples, in an attempt to build a catalog of different fairness settings. We also take a deeper look into applying Causal Fairness Analysis to explain racial and ethnic disparities following admission to an intensive care unit (ICU). Our analysis reveals that minority patients are much more likely to be admitted to the ICU, and that this increase in admission is linked with lack of access to primary care. This leads us to construct the Indigenous Intensive Care Equity (IICE) Radar, a monitoring system for tracking the over-utilization of ICU resources by the Indigenous population of Australia across geographical areas, opening the door for targeted public health interventions aimed at improving health equity.

Related papers:
[1] Plečko, Drago, and Elias Bareinboim. “”Causal fairness analysis: a causal toolkit for fair machine learning.”” Foundations and Trends® in Machine Learning 17.3 (2024): 304-589.
[2] Plecko, Drago, et al. An Algorithmic Approach for Causal Health Equity: A Look at Race Differentials in Intensive Care Unit (ICU) Outcomes. arXiv preprint arXiv:2501.05197 (2025).

Bio:

Drago Plecko is a postdoctoral scholar in the Department of Computer Science at Columbia University, having joined after completing his PhD in Statistics at ETH Zürich. His research focuses on causal inference, and spans several topics in trustworthy data science, including fairness, recourse, and explainability. Drago also has a strong interest in applied problems, particularly in medicine, where he investigated epidemiological questions in intensive care medicine.

Thursday 02/06/25, Time: 11:00am – 12:15pm, A Unified Framework for Efficient Learning at Scale

Location: Franz Hall 2258A

Soufiane Hayou, Postdoctoral Scholar
Simons Institute, UC Berkeley

Abstract:

State-of-the-art performance is usually achieved via a series of modifications to existing neural architectures and their training procedures. A common feature of these networks is their large-scale nature: modern neural networks usually have billions – if not hundreds of billions – of trainable parameters. While empirical evaluations generally support the claim that increasing the scale of neural networks (width, depth, etc) boosts model performance if done correctly, optimizing the training process across different scales remains a significant challenge, and practitioners tend to follow empirical scaling laws from the literature. In this talk, I will present a unified framework for efficient learning at large scale. The framework allows us to derive efficient learning rules that automatically adjust to model scale, ensuring stability and optimal performance. By analyzing the interplay between network architecture, optimization dynamics, and scale, we demonstrate how these theoretically-grounded learning rules can be applied to both pretraining and finetuning. The results offer new insights into the fundamental principles governing neural network scaling and provide practical guidelines for training large-scale models efficiently.

Bio:

Soufiane Hayou is currently a postdoctoral researcher at Simons Institute, UC Berkeley. He was a visiting assistant professor of mathematics at the National University of Singapore for the last 3 years. He obtained his PhD in statistics and machine learning in 2021 from the University of Oxford, and graduated from Ecole Polytechnique in Paris before joining Oxford. His research is mainly focused on the theory and practice of learning at scale: theoretical analysis of large scale neural networks with the goal of obtaining principled methods for training/finetuning. Topics include depth scaling (Stable ResNet), hyperparameter transfer (Depth-muP parametrization), efficient finetuning (LoRA+, a method that improves upon LoRA by setting optimal learning rates for matrices A and B) etc.

Thursday 01/30/25, Time: 11:00am – 12:15pm, Modern Sampling Paradigms: from Posterior Sampling to Generative AI

Location: Franz Hall 2258A

Yuchen Wu, Postdoctoral Researcher
Department of Statistics and Data Science at the Wharton School, University of Pennsylvania

Abstract:

Sampling from a target distribution is a recurring theme in statistics and generative artificial intelligence (AI). In statistics, posterior sampling offers a flexible inferential framework, enabling uncertainty quantification, probabilistic prediction, as well as the estimation of intractable quantities. In generative AI, sampling aims to generate unseen instances that emulate a target population, such as the natural distributions of texts, images, and molecules. In this talk, I will present my works on designing provably efficient sampling algorithms, addressing challenges in both statistics and generative AI. In the first part, I will focus on posterior sampling for Bayes sparse regression. In general, such posteriors are high-dimensional and contain many modes, making them challenging to sample from. To address this, we develop a novel sampling algorithm based on decomposing the target posterior into a log-concave mixture of simple distributions, reducing sampling from a complex distribution to sampling from a tractable log-concave one. We establish provable guarantees for our method in a challenging regime that was previously intractable. In the second part, I will describe a training-free acceleration method for diffusion models, which are deep generative models that underpin cutting-edge applications such as AlphaFold, DALL-E and Sora. Our approach is simple to implement, wraps around any pre-trained diffusion model, and comes with a provable convergence rate that strengthens prior theoretical results. We demonstrate the effectiveness of our method on several real-world image generation tasks. Lastly, I will outline my vision for bridging the fields of statistics and generative AI, exploring how insights from one domain can drive progress in the other.

Bio:

Yuchen Wu is a departmental postdoctoral researcher in the Department of Statistics and Data Science at the Wharton School, University of Pennsylvania. She earned her Ph.D. in 2023 from Stanford University, where she was advised by Professor Andrea Montanari. Her research lies broadly at the intersection of statistics and machine learning, featuring generative AI, high-dimensional statistics, Bayesian inference, algorithm design, and data-driven decision making.

Tuesday 01/28/25, Time: 11:00am – 12:15pm, Policy Evaluation in Dynamic Experiments

Location: Mathematical Sciences 8359

Yuchen Hu, Ph.D. Student
Management Science and Engineering, Stanford University

Abstract:

Experiments where treatment assignment varies over time, such as micro-randomized trials and switchback experiments, are essential for guiding dynamic decisions. These experiments often exhibit nonstationarity due to factors like hidden states or unstable environments, posing substantial challenges for accurate policy evaluation. In this talk, I will discuss how Partially Observed Markov Decision Processes (POMDPs) with explicit mixing assumptions provide a natural framework for modeling dynamic experiments and can guide both the design and analysis of these experiments. In the first part of the talk, I will discuss properties of switchback experiments in finite-population, nonstationary dynamic systems. We find that, in this setting, standard switchback designs suffer considerably from carryover bias, but judicious use of burn-in periods can considerably improve the situation and enable errors that decay nearly at the parametric rate. In the second part of the talk, I will discuss policy evaluation in micro-randomized experiments and provide further theoretical grounding on mixing-based policy evaluation methodologies. Under a sequential ignorability assumption, we provide rate-matching upper and lower bounds that sharply characterize the hardness of off-policy evaluation in POMDPs. These findings demonstrate the promise of using stochastic modeling techniques to enhance tools for causal inference. Our formal results are mirrored in empirical evaluations using ride-sharing and mobile health simulators.

Bio:

Yuchen Hu is a Ph.D. candidate in Management Science and Engineering at Stanford University, under the supervision of Professor Stefan Wager. Her research focuses on causal inference, data-driven decision making, and stochastic processes. She is particularly interested in developing interdisciplinary statistical methodologies that enhance the applicability, robustness, and efficiency of data-driven decisions in complex environments. Hu holds an M.S. in Biostatistics from Harvard University and a B.Sc. in Applied Mathematics from Hong Kong Polytechnic University.

Thursday 01/23/25, Time: 11:00am – 12:15pm, Transfer and Multi-task Learning: Statistical Insights for Modern Data Challenges

Location: Franz Hall 2258A

Ye Tian, Ph.D. Student
Department of Statistics, Columbia University

Abstract:

Knowledge transfer, a core human ability, has inspired numerous data integration methods in machine learning and statistics. However, data integration faces significant challenges: (1) unknown similarity between data sources; (2) data contamination; (3) high-dimensionality; and (4) privacy constraints. This talk addresses these challenges in three parts across different contexts, presenting both innovative statistical methodologies and theoretical insights. In Part I, I will introduce a transfer learning framework for high-dimensional generalized linear models that combines a pre-trained Lasso with a fine-tuning step. We provide theoretical guarantees for both estimation and inference, and apply the methods to predict county-level outcomes of the 2020 U.S. presidential election, uncovering valuable insights. In Part II, I will explore an unsupervised learning setting where task-specific data is generated from a mixture model with heterogeneous mixture proportions. This complements the supervised learning setting discussed in Part I, addressing scenarios where labeled data is unavailable. We propose a federated gradient EM algorithm that is communication-efficient and privacy-preserving, providing estimation error bounds for the mixture model parameters. In Part III, I will introduce a representation-based multi-task learning framework that generalizes the distance-based similarity notion discussed in Parts I and II. This framework is closely related to modern applications of fine-tuning in image classification and natural language processing. I will discuss how this study enhances our understanding of the effectiveness of fine-tuning and the influence of data contamination on representation multi-task learning. Finally, I will summarize the talk and briefly introduce my broader research interests. The three main sections of this talk are based on a series of papers [TF23, TWXF22, TWF24, TGF23] and a short course I co-taught at NESS 2024 [STL24]. More about me and my research can be found at https://yet123.com.

[TF23] Tian, Y., & Feng, Y. (2023). Transfer Learning under High-dimensional Generalized Linear Models. Journal of the American Statistical Association, 118(544), 2684-2697.
[TWXF22] Tian, Y., Weng, H., Xia, L., & Feng, Y. (2022). Unsupervised Multi-task and Transfer Learning on Gaussian Mixture Models. arXiv preprint arXiv:2209.15224.
[TWF24] Tian, Y., Weng, H., & Feng, Y. (2024). Towards the Theory of Unsupervised Federated Learning: Non-asymptotic Analysis of Federated EM Algorithms. ICML 2024.
[TGF23] Tian, Y., Gu, Y., & Feng, Y. (2023). Learning from Similar Linear Representations: Adaptivity, Minimaxity, and Robustness. arXiv preprint arXiv:2303.17765.
[STL24] A (Selective) Introduction to the Statistics Foundations of Transfer Learning. (2024).

Bio:

Ye Tian is a final-year Ph.D. student in Statistics at Columbia University. His research lies at the intersection of statistics, data science, and machine learning, focusing on three main topics: (1) reliable transfer learning; (2) high-dimensional statistics; and (3) privacy and fairness of the learning system.

Thursday 01/16/25, Time: 11:00am – 12:15pm, Scientific Machine Learning in the New Era of AI: Foundations, Visualization, and Reasoning

Location: Online

Wuyang Chen, Assistant Professor
Computing Science, Simon Fraser University

Abstract:

The rapid advancements in artificial intelligence (AI), propelled by data-centric scaling laws, have significantly transformed our understanding and generation of both vision and language. However, natural media, such as images, videos, and languages, represent only a fraction of the modalities we encounter, leaving much of the physical world underexplored. We propose that Scientific Machine Learning (SciML) offers a knowledge-driven framework that complements data-driven AI, enabling us to better understand, visualize, and interact with the diverse complexities of the physical world. In this talk, we will delve into the cutting-edge intersection of AI and SciML. First, we will discuss the automation of scientific analysis through multi-step reasoning grounded with formal languages, paving the way for more advanced control and interactions in scientific models. Second, we will demonstrate how SciML can streamline the visualization of intricate geometries, while also showing how spatial intelligence can be adapted for more robust SciML modeling. Finally, we will explore how scaling scientific data can train foundation models that integrate multiphysics knowledge, thereby enhancing traditional simulations with a deeper understanding of physical principles.

Bio:

Dr. Wuyang Chen is a tenure-track Assistant Professor in Computing Science at Simon Fraser University. Previously, he was a postdoctoral researcher in Statistics at the University of California, Berkeley. He obtained his Ph.D. in Electrical and Computer Engineering from the University of Texas at Austin in 2023. Dr. Chen’s research focuses on scientific machine learning, theoretical understanding of deep networks, and related applications in foundation models, computer vision, and AutoML. He also works on domain adaptation/generalization and self-supervised learning. Dr. Chen has published papers at CVPR, ECCV, ICLR, ICML, NeurIPS, and other top conferences. Dr. Chen’s research has been recognized by NSF (National Science Foundation) newsletter in 2022, INNS Doctoral Dissertation Award and the iSchools Doctoral Dissertation Award in 2024, and AAAI New Faculty Highlights in 2025. Dr. Chen is the host of the Foundation Models for Science workshop at NeurIPS 2024 and co-organized the 4th and 5th versions of the UG2+ workshop and challenge at CVPR in 2021 and 2022. He also serves on the board of the One World Seminar Series on the Mathematics of Machine Learning.

Thursday 01/09/25, Time: 11:00am – 12:15pm, PDE-based model-free algorithm for Continuous-time Reinforcement Learning

Location: Franz Hall 2258A

Yuhua Zhu, Assistant Professor
UCLA Department of Statistics and Data Science

Abstract:

This talk addresses the problem of continuous-time reinforcement learning (RL) in scenarios where the dynamics follow a stochastic differential equation. When the underlying dynamics remain unknown and only discrete-time observations are available, how can we effectively conduct policy evaluation? We first highlight that while model-free RL algorithms are straightforward to implement, they are often not a reliable approximation of the true value function. On the other hand, model-based PDE approaches are more accurate, but the inverse problem is not easy to solve. To bridge this gap, we introduce a new Bellman equation, PhiBE, which integrates discrete-time information into a PDE formulation. PhiBE allows us to skip the identification of the dynamics and directly evaluate the value function using discrete-time data. Additionally, it offers a more accurate approximation of the true value function, especially in scenarios where the underlying dynamics change slowly. Moreover, we extend PhiBE to higher orders, providing increasingly accurate approximations.

Bio:

Yuhua Zhu is an assistant professor in the Department of Statistics and Data Science at UCLA. Previously, she was an assistant professor at UC San Diego, where she held a joint appointment in the Halicioğlu Data Science Institute (HDSI) and the Department of Mathematics. Before that, she was a Postdoctoral Fellow at Stanford University and earned her Ph.D. from UW-Madison. Her work bridges the gap between partial differential equations and machine learning, with a focus on reinforcement learning, stochastic optimization, and uncertainty quantification.

Thursday 12/05/24, Time: 11:00am – 12:15pm, Accelerating, Enhancing, and Securing Deep Generative Models with Score Identity Distillation

Location: Haines A25

Mingyuan Zhou, Associate Professor
Statistics Group, McCombs School of Business, University of Texas at Austin

Abstract:

Diffusion models, renowned for their photorealistic generation capabilities, face significant challenges, including slow generation speeds and the risk of producing inappropriate content. In this talk, I will begin with an overview of deep generative models before introducing a paradigm shift in utilizing pretrained diffusion models for data generation. Instead of relying on these models to reverse the diffusion process through iterative refinement, I will demonstrate the exceptional efficacy and versatility of distilling a one-step generator using the framework of Score identity Distillation (SiD).

Challenging the conventional belief that high-quality diffusion-based generation requires iterative refinement, SiD demonstrates that a bias-corrected estimation of the gradient of a model-based Fisher divergence is all that is needed to distill a high-performing single-step generator from a teacher model. This data-free approach not only accelerates generation but also often surpasses the quality of the teacher models, which typically depend on dozens to hundreds of iterative steps.
SiD further enhances generative performance by incorporating training data to enable joint distillation and adversarial generation, resulting in substantial improvements over its teacher models. Moreover, safeguards can be seamlessly integrated into SiD to selectively forget unsafe concepts, such as nudity and personal identities, promoting safer and more ethical content generation. These advancements establish SiD as a robust and versatile framework for high-quality, efficient, and secure generative AI, opening new avenues for groundbreaking research and practical applications.

Bio:

Mingyuan Zhou is an Associate Professor at the University of Texas at Austin, specializing in machine learning and probabilistic modeling. He earned his Ph.D. from Duke University in 2013. His research spans generative models, statistical inference for big data, deep learning, and reinforcement learning. Currently, he focuses on advancing generative AI technologies to improve their efficiency, speed, and safety. Recent examples of his research group’s contributions include Diffusion-QL, Diffusion-GAN, Beta Diffusion, and Score identity Distillation with its variations. He also serves as an Action Editor for the Journal of Machine Learning Research.

Thursday 11/21/24, Time: 11:00am – 12:15pm, A Panel Discussion with Professor Nancy Reid

Location: Haines A25

Nancy Reid, Professor
Department of Statistical Science, University of Toronto

Abstract:

A panel discussion with Professor Nancy Reid will take place.

Bio:

Nancy Reid is University Professor in the Department of Statistical Sciences at the University of Toronto. Her research interests include statistical theory, likelihood inference, design of studies, and statistical science in public policy. She has held many professional leadership roles in statistical science, in Canada and abroad. Nancy studied at the University of Waterloo (B.Math. 1974), the University of British Columbia (M.Sc. 1976), Stanford University (PhD 1979) and Imperial College, London (PDF 1980). She joined the University of Toronto in 1986 from the University of British Columbia. Nancy is a Fellow of the Royal Society, the Royal Society of Canada, the American Association for the Advancement of Science, and a Foreign Associate of the National Academy of Sciences. In 2015 she was appointed Officer of the Order of Canada. In 2023 she received the David R. Cox Foundations of Statistics Award, “for contributions to the foundations of statistics that significantly advanced the frontiers of statistics and for insight that transformed understanding of parametric statistical inference”.

Wednesday 11/20/24, Time: 3:30pm – 4:30pm, When Likelihood Goes Wrong

Location: CHS 43105A

Joint Seminar hosted by UCLA Biostatistics

Nancy Reid, Professor
Department of Statistical Science, University of Toronto

Abstract:

Inference based on the likelihood function is the workhorse of statistics, and constructing the likelihood function is often the first step in any detailed analysis, even for very complex data. At the same time, statistical theory tells us that ‘black-box’ use of likelihood inference can be very sensitive to the dimension of the parameter space, the structure of the parameter space, and any measurement error in the data. This has been recognized for a long time, and many alternative approaches have been suggested with a view to preserving some of the virtues of likelihood inference while ameliorating some of the difficulties. In this talk I will discuss some of the ways that likelihood inference can go wrong, and some of the potential remedies, with particular emphasis on model misspecification.

Bio:

Nancy Reid is University Professor in the Department of Statistical Sciences at the University of Toronto. Her research interests include statistical theory, likelihood inference, design of studies, and statistical science in public policy. She has held many professional leadership roles in statistical science, in Canada and abroad. Nancy studied at the University of Waterloo (B.Math. 1974), the University of British Columbia (M.Sc. 1976), Stanford University (PhD 1979) and Imperial College, London (PDF 1980). She joined the University of Toronto in 1986 from the University of British Columbia. Nancy is a Fellow of the Royal Society, the Royal Society of Canada, the American Association for the Advancement of Science, and a Foreign Associate of the National Academy of Sciences. In 2015 she was appointed Officer of the Order of Canada. In 2023 she received the David R. Cox Foundations of Statistics Award, “for contributions to the foundations of statistics that significantly advanced the frontiers of statistics and for insight that transformed understanding of parametric statistical inference”.

Thursday 11/14/24, Time: 11am-12:15pm, Advancing Statistical Rigor in Educational Assessment and Single-Cell Omics Using In Silico Control Data

Location: Haines Hall A25

Guanao Yan, Ph.D. Student
UCLA Department of Statistics and Data Science

Abstract:

In this talk, I will explore how in silico control data can be used to enhance statistical rigor in two distinct fields: educational assessment and single-cell omics data analysis. First, I will discuss the application of in silico data in educational contexts to promote fairness in assessment. Specifically, I will highlight my work on developing statistical tools to detect patterns of collusion in online exams. By incorporating in silico data as negative controls, we can quantify errors—such as false positives—ensuring that educational assessments accurately reflect true student performance. Next, I will address challenges in single-cell data analysis, particularly the complexity of selecting the right tool from over 1,700 available computational methods. One promising solution is the generation and use of synthetic data as positive controls. This approach establishes trustworthy evaluation standards, enabling more accurate method comparisons and providing rigorous evidence to advance our understanding of cellular biology.

Bio:

Guanao Yan is a fifth-year PhD student in the Department of Statistics and Data Science at the University of California, Los Angeles (UCLA), where he is supervised by Dr. Jingyi Jessica Li. His research focuses on developing novel statistical methods to address real-world data challenges, spanning multiple fields. In statistical bioinformatics, he specializes in analyzing single-cell and spatial omics data, with a particular emphasis on using synthetic data to improve the statistical rigor of these analyses. His work also extends to general statistical methodologies, including high-dimensional model inference and variable selection, as well as to education, where he develops statistical methods to promote equity by ensuring fair and accurate assessments in educational systems.

Thursday, 11/07/2024, Time: 11:00am-12:15pm, Selecting Informative Subdata from a Large Dataset

Location: Haines Hall A25

John Stufken, Professor
Department of Statistics, George Mason University

Abstract:

Exploration or inference for large datasets can be computationally expensive. Depending on the objectives, it is often possible to obtain reliable results without using all of the observations. This has resulted in the development of methods for selecting some of the observations from a large dataset, either through a stochastic or deterministic approach, and draw conclusions based on the selected data only. Ideally, such selection methods are robust to model misspecification and objectives (e.g., exploration, estimation, prediction) and are computationally efficient. I will present a selective overview of methods developed during the last years for selecting informative subdata.

Bio:

John Stufken is Professor in the Department of Statistics at George Mason University, where he also serves as Associate Chair for Research. Prior to this he was Bank of America Excellence Professor and Director for Informatics and Analytics at UNC Greensboro (2019-2022), Charles Wexler Endowed Professor and Coordinator for Statistics at Arizona State University (2014-2019), Head of the Department of Statistics at the University of Georgia (2003-2014), Program Director for Statistics at the National Science Foundation (2000-2003), Assistant, Associate and Full Professor in the Department of Statistics at Iowa State University (1988-2002), and Assistant and Associate Professor in the Department of Statistics at the University of Georgia (1986-1990). Stufken’s research interests are in design and analysis of experiments and subsampling of big data. He currently serves as co-Editor for Statistica Sinica (2023-2026), and has in the past been Editor for The American Statistician (2009-2011) and the Journal of Statistical Planning and Inference (2004-2006). He is an Elected Fellow of the ASA and IMS, an Elected Member of the ISI, and served as the Rothschild Distinguished Visiting Fellow at the Isaac Newton Institute for Mathematical Sciences during the 2011 workshop on Design and Analysis of Experiments.

Thursday, 10/31/2024, Time: 11:00am-12:15pm, How to be a good and a bad Statistician: Ethical considerations in Model Construction?

Location: Haines Hall A25

Mahtash Esfandiari, Senior Continuing Lecturer
UCLA Department of Statistics and Data Science

Ariana Anderson, Assistant Professor-in-Residence
Psychiatry and Biobehavioral Science, UCLA School of Medicine

Wenhong Sun, Senior
UCLA Department of Statistics and Data Science

Abstract:

In this presentation, we will: 1) present a short description of the ethical guidelines of the American Statistical Association, and Royal Statistics Society, 2) introduce four major ethical consideration in machine learning research proposed by Toms and Whithworth that resulted from extensive literature review, collaboration with national statistics institutes, and feedback from stakeholders, and 3) Provide real world examples in which these four major ethical considerations are jeopardized.

Then, we introduce of a case study in which multiple teams of Statistics and Data Science seniors enrolled in our Capstone course (Introduction to the Practice of Statistical Consulting) implemented different “machine learning” methods to analyze baby cry data from Kaggle playing the role of good and bad statistician with Mahtash Esfandiari as their instructor, Dr. Ariana Anderson as the client, and Wenhong Sun as one of the team members who participated in the project.

Finally: 1) we will present a meta-analysis related to the accuracy of predictive models created by teams playing good and bad statistician, and 2) discuss which of the four major ethical guidelines discussed could be potentially jeopardized in predicting the type of baby cry using the Kaggle data.

Bio:

Dr. Mahtash Esfandiari is full time faculty in the Department of Statistics and Data Science at UCLA. She obtained her PhD in cognitive science and applied statistics from the University of Illinois in Urbana Champaign. She has extensive consulting experience in medical sciences, industry, and evaluation of interventions. Her research interests include application of cognitive science to enhancement of learning and teaching of statistics, diversity, and emotional intelligence (EQ) as a major component of a successful statistical consulting. Over her career Dr. Esfandiari has obtained several research grants from the National Science Foundation and College of Letters and Science at UCLA and did innovative work in statistics education, diversity, and statistical consulting.

Dr. Ariana Anderson is an Assistant Professor in the Departments of Psychiatry and Biobehavioral Sciences and Statistics at the University of California, Los Angeles (UCLA). She earned her Bachelor’s degree in Mathematics, followed by a Master’s and Doctorate in Statistics, and completed postdoctoral training in neuroimaging and psychiatry at UCLA’s Department of Psychiatry and Biobehavioral Sciences. For over 13 years, she has been developing new statistical methodologies to enhance the analysis of clinical and brain imaging data. Dr. Anderson’s research focuses on mapping structural and functional brain changes related to pathological processes, utilizing both unsupervised and supervised MRI-based neuroimaging statistical techniques. She serves as the Principal Investigator on multiple NIH-funded brain-imaging studies that explore how changes in brain structure and function due to disease contribute to cognitive decline.

Wenhong Sun is a senior in the Department of Statistics and Data Science at UCLA.

Thursday, 10/17/2024, Time: 11:00am-12:15pm, Wasserstein Regression of Covariance Matrix on Vector Covariates for Single Cell Gene Co-expression Analysis

Location: Haines Hall A25

Hongzhe Li, Professor
Biostatistics, Epidemiology and Informatics, Perelman School of Medicine at the University of Pennsylvania

Abstract:

Population-level single-cell gene expression data captures the gene expressions of thousands of cells from each individual within a sizable cohort. This data enables the construction of cell-type- and individual-specific gene co-expression networks by estimating the covariance matrices. Understanding how such co-expression networks are associated with individual-level covariates is crucial. This paper considers Fréchet regression with the covariance matrix as the outcome and vector covariates, using the Wasserstein distance between covariance matrices as a substitute for the Euclidean distance. A test statistic is defined based on the Fréchet mean and covariate-weighted Fréchet mean. The asymptotic distribution of the test statistic is derived under the assumption of simultaneously diagonalizable covariance matrices. Results from an analysis of large-scale single-cell data reveal an association between the co-expression network of genes in the nutrient sensing pathway and age, indicating a perturbation in gene co-expression networks with aging. More general Fréchet regression on the Bures-Wasserstein manifold will also be discussed and applied to the same single-cell RNA-seq data.

Biography:

Dr. Hongzhe Li is Perelman Professor of Biostatistics, Epidemiology and Informatics at the Perelman School of Medicine at the University of Pennsylvania. He is Vice Chair for Research Integration, Director of Center of Statistics in Big Data and former Chair of the Graduate Program in Biostatistics at Penn. He is also a Professor of Statistics and Data Science at the Wharton School and a Professor of Applied Mathematics and Computational Science at Penn. Dr. Li has been elected as a Fellow of the American Statistical Association (ASA), a Fellow of the Institute of Mathematical Statistics (IMS) and a Fellow of American Association for the Advancement of Science (AAAS). Dr. Li served on the Board of Scientific Counselors of the National Cancer Institute of NIH and regularly serves on various NIH study sections. He served as Chair of the Section on Statistics in Genomics and Genetics of the ASA and Co-Editor-in-Chief of Statistics in Biosciences. Dr. Li’s research focuses on developing statistical and computational methods for analysis of large-scale genetic, genomics and metagenomics data and theory on high dimensional statistics. He has trained over 50 PhD students and postdoctoral fellows.

Thursday, 10/10/2024, Time: 11:00am-12:15pm, A power law Hawkes process modelling earthquake occurrences

Location: Haines Hall A25

Boris Baeumer, Professor
Department of Mathematics and Statistics, University of Otago

Abstract:

In order to capture the increased frequency of earth quakes (aftershocks) after a large event we use a Hawkes process model based on the first relaxation eigenmode of a visco-elastic plate model; i.e. we assume the kernel functions of the Hawkes model are Mittag-Leffler functions. Assuming magnitude and frequency being separable leads to a model that for most data bases outperforms Ogata’s ETAS model predicting earthquake frequency. Removing the restrictive assumption that magnitude and frequency are separable, in order to obtain a parsimonious model of the joint process we need to model the impact of an earthquake of a given magnitude on the intensity measures of all earthquakes. We use marked multivariate Hawkes processes to inform the shape of a parsimonious kernel.

Biography:

Boris Baeumer is a Professor in the Department of Mathematics and Statistics at University of Otago, New Zealand. He obtained his PhD in Mathematics from Louisiana State University. His research interests include non-local PDEs and associated stochastic processes. Over his career he obtained several major research grants and is currently (co-)principal investigator researching “Modelling the domino effect in complex systems”.

Thursday, 10/03/2024, Time: 11:00am-12:15pm, Causal inference in network experiments: regression-based analysis and design-based properties

Location: Haines Hall A25

Peng Ding, Associate Professor
Department of Statistics, UC Berkeley

Abstract:

Investigating interference or spillover effects among units is a central task in many social science problems. Network experiments are powerful tools for this task, which avoids endogeneity by randomly assigning treatments to units over networks. However, it is non-trivial to analyze network experiments properly without imposing strong modeling assumptions. Previously, many researchers have proposed sophisticated point estimators and standard errors for causal effects under network experiments. We further show that regression-based point estimators and standard errors can have strong theoretical guarantees if the regression functions and robust standard errors are carefully specified to accommodate the interference patterns under network experiments. We first recall a well-known result that the Hajek estimator is numerically identical to the coefficient from the weighted-least-squares fit based on the inverse probability of the exposure mapping. Moreover, we demonstrate that the regression-based approach offers three notable advantages: its ease of implementation, the ability to derive standard errors through the same weighted-least-squares fit, and the capacity to integrate covariates into the analysis, thereby enhancing estimation efficiency. Furthermore, we analyze the asymptotic bias of the regression-based network-robust standard errors. Recognizing that the covariance estimator can be anti-conservative, we propose an adjusted covariance estimator to improve the empirical coverage rates. Although we focus on regression-based point estimators and standard errors, our theory holds under the design-based framework, which assumes that the randomness comes solely from the design of network experiments and allows for arbitrary misspecification of the regression models.

Biography:

Peng Ding is an Associate Professor in the Department of Statistics at UC Berkeley. He obtained his Ph.D. from the Department of Statistics, Harvard University in May 2015, and worked as a postdoctoral researcher in the Department of Epidemiology, Harvard T. H. Chan School of Public Health until December 2015.