Thursday 11/21/24, Time: 11:00am – 12:15pm, A Panel Discussion with Professor Nancy Reid
Location: Haines A25
Nancy Reid, Professor
Department of Statistical Science, University of Toronto
Abstract:
A panel discussion with Professor Nancy Reid will take place.
Bio:
Nancy Reid is University Professor in the Department of Statistical Sciences at the University of Toronto. Her research interests include statistical theory, likelihood inference, design of studies, and statistical science in public policy. She has held many professional leadership roles in statistical science, in Canada and abroad. Nancy studied at the University of Waterloo (B.Math. 1974), the University of British Columbia (M.Sc. 1976), Stanford University (PhD 1979) and Imperial College, London (PDF 1980). She joined the University of Toronto in 1986 from the University of British Columbia. Nancy is a Fellow of the Royal Society, the Royal Society of Canada, the American Association for the Advancement of Science, and a Foreign Associate of the National Academy of Sciences. In 2015 she was appointed Officer of the Order of Canada. In 2023 she received the David R. Cox Foundations of Statistics Award, “for contributions to the foundations of statistics that significantly advanced the frontiers of statistics and for insight that transformed understanding of parametric statistical inference”.
Wednesday 11/20/24, Time: 3:30pm – 4:30pm, When Likelihood Goes Wrong
Location: CHS 43105A
Joint Seminar hosted by UCLA Biostatistics
Nancy Reid, Professor
Department of Statistical Science, University of Toronto
Abstract:
Inference based on the likelihood function is the workhorse of statistics, and constructing the likelihood function is often the first step in any detailed analysis, even for very complex data. At the same time, statistical theory tells us that ‘black-box’ use of likelihood inference can be very sensitive to the dimension of the parameter space, the structure of the parameter space, and any measurement error in the data. This has been recognized for a long time, and many alternative approaches have been suggested with a view to preserving some of the virtues of likelihood inference while ameliorating some of the difficulties. In this talk I will discuss some of the ways that likelihood inference can go wrong, and some of the potential remedies, with particular emphasis on model misspecification.
Bio:
Nancy Reid is University Professor in the Department of Statistical Sciences at the University of Toronto. Her research interests include statistical theory, likelihood inference, design of studies, and statistical science in public policy. She has held many professional leadership roles in statistical science, in Canada and abroad. Nancy studied at the University of Waterloo (B.Math. 1974), the University of British Columbia (M.Sc. 1976), Stanford University (PhD 1979) and Imperial College, London (PDF 1980). She joined the University of Toronto in 1986 from the University of British Columbia. Nancy is a Fellow of the Royal Society, the Royal Society of Canada, the American Association for the Advancement of Science, and a Foreign Associate of the National Academy of Sciences. In 2015 she was appointed Officer of the Order of Canada. In 2023 she received the David R. Cox Foundations of Statistics Award, “for contributions to the foundations of statistics that significantly advanced the frontiers of statistics and for insight that transformed understanding of parametric statistical inference”.
Thursday 11/14/24, Time: 11am-12:15pm, Advancing Statistical Rigor in Educational Assessment and Single-Cell Omics Using In Silico Control Data
Location: Haines Hall A25
Guanao Yan, Ph.D. Student
UCLA Department of Statistics and Data Science
Abstract:
In this talk, I will explore how in silico control data can be used to enhance statistical rigor in two distinct fields: educational assessment and single-cell omics data analysis. First, I will discuss the application of in silico data in educational contexts to promote fairness in assessment. Specifically, I will highlight my work on developing statistical tools to detect patterns of collusion in online exams. By incorporating in silico data as negative controls, we can quantify errors—such as false positives—ensuring that educational assessments accurately reflect true student performance. Next, I will address challenges in single-cell data analysis, particularly the complexity of selecting the right tool from over 1,700 available computational methods. One promising solution is the generation and use of synthetic data as positive controls. This approach establishes trustworthy evaluation standards, enabling more accurate method comparisons and providing rigorous evidence to advance our understanding of cellular biology.
Bio:
Guanao Yan is a fifth-year PhD student in the Department of Statistics and Data Science at the University of California, Los Angeles (UCLA), where he is supervised by Dr. Jingyi Jessica Li. His research focuses on developing novel statistical methods to address real-world data challenges, spanning multiple fields. In statistical bioinformatics, he specializes in analyzing single-cell and spatial omics data, with a particular emphasis on using synthetic data to improve the statistical rigor of these analyses. His work also extends to general statistical methodologies, including high-dimensional model inference and variable selection, as well as to education, where he develops statistical methods to promote equity by ensuring fair and accurate assessments in educational systems.
Thursday, 11/07/2024, Time: 11:00am-12:15pm, Selecting Informative Subdata from a Large Dataset
Location: Haines Hall A25
John Stufken, Professor
Department of Statistics, George Mason University
Abstract:
Exploration or inference for large datasets can be computationally expensive. Depending on the objectives, it is often possible to obtain reliable results without using all of the observations. This has resulted in the development of methods for selecting some of the observations from a large dataset, either through a stochastic or deterministic approach, and draw conclusions based on the selected data only. Ideally, such selection methods are robust to model misspecification and objectives (e.g., exploration, estimation, prediction) and are computationally efficient. I will present a selective overview of methods developed during the last years for selecting informative subdata.
Bio:
John Stufken is Professor in the Department of Statistics at George Mason University, where he also serves as Associate Chair for Research. Prior to this he was Bank of America Excellence Professor and Director for Informatics and Analytics at UNC Greensboro (2019-2022), Charles Wexler Endowed Professor and Coordinator for Statistics at Arizona State University (2014-2019), Head of the Department of Statistics at the University of Georgia (2003-2014), Program Director for Statistics at the National Science Foundation (2000-2003), Assistant, Associate and Full Professor in the Department of Statistics at Iowa State University (1988-2002), and Assistant and Associate Professor in the Department of Statistics at the University of Georgia (1986-1990). Stufken’s research interests are in design and analysis of experiments and subsampling of big data. He currently serves as co-Editor for Statistica Sinica (2023-2026), and has in the past been Editor for The American Statistician (2009-2011) and the Journal of Statistical Planning and Inference (2004-2006). He is an Elected Fellow of the ASA and IMS, an Elected Member of the ISI, and served as the Rothschild Distinguished Visiting Fellow at the Isaac Newton Institute for Mathematical Sciences during the 2011 workshop on Design and Analysis of Experiments.
Thursday, 10/31/2024, Time: 11:00am-12:15pm, How to be a good and a bad Statistician: Ethical considerations in Model Construction?
Location: Haines Hall A25
Mahtash Esfandiari, Senior Continuing Lecturer
UCLA Department of Statistics and Data Science
Ariana Anderson, Assistant Professor-in-Residence
Psychiatry and Biobehavioral Science, UCLA School of Medicine
Wenhong Sun, Senior
UCLA Department of Statistics and Data Science
Abstract:
In this presentation, we will: 1) present a short description of the ethical guidelines of the American Statistical Association, and Royal Statistics Society, 2) introduce four major ethical consideration in machine learning research proposed by Toms and Whithworth that resulted from extensive literature review, collaboration with national statistics institutes, and feedback from stakeholders, and 3) Provide real world examples in which these four major ethical considerations are jeopardized.
Then, we introduce of a case study in which multiple teams of Statistics and Data Science seniors enrolled in our Capstone course (Introduction to the Practice of Statistical Consulting) implemented different “machine learning” methods to analyze baby cry data from Kaggle playing the role of good and bad statistician with Mahtash Esfandiari as their instructor, Dr. Ariana Anderson as the client, and Wenhong Sun as one of the team members who participated in the project.
Finally: 1) we will present a meta-analysis related to the accuracy of predictive models created by teams playing good and bad statistician, and 2) discuss which of the four major ethical guidelines discussed could be potentially jeopardized in predicting the type of baby cry using the Kaggle data.
Bio:
Dr. Mahtash Esfandiari is full time faculty in the Department of Statistics and Data Science at UCLA. She obtained her PhD in cognitive science and applied statistics from the University of Illinois in Urbana Champaign. She has extensive consulting experience in medical sciences, industry, and evaluation of interventions. Her research interests include application of cognitive science to enhancement of learning and teaching of statistics, diversity, and emotional intelligence (EQ) as a major component of a successful statistical consulting. Over her career Dr. Esfandiari has obtained several research grants from the National Science Foundation and College of Letters and Science at UCLA and did innovative work in statistics education, diversity, and statistical consulting.
Dr. Ariana Anderson is an Assistant Professor in the Departments of Psychiatry and Biobehavioral Sciences and Statistics at the University of California, Los Angeles (UCLA). She earned her Bachelor’s degree in Mathematics, followed by a Master’s and Doctorate in Statistics, and completed postdoctoral training in neuroimaging and psychiatry at UCLA’s Department of Psychiatry and Biobehavioral Sciences. For over 13 years, she has been developing new statistical methodologies to enhance the analysis of clinical and brain imaging data. Dr. Anderson’s research focuses on mapping structural and functional brain changes related to pathological processes, utilizing both unsupervised and supervised MRI-based neuroimaging statistical techniques. She serves as the Principal Investigator on multiple NIH-funded brain-imaging studies that explore how changes in brain structure and function due to disease contribute to cognitive decline.
Wenhong Sun is a senior in the Department of Statistics and Data Science at UCLA.
Thursday, 10/17/2024, Time: 11:00am-12:15pm, Wasserstein Regression of Covariance Matrix on Vector Covariates for Single Cell Gene Co-expression Analysis
Location: Haines Hall A25
Hongzhe Li, Professor
Biostatistics, Epidemiology and Informatics, Perelman School of Medicine at the University of Pennsylvania
Abstract:
Population-level single-cell gene expression data captures the gene expressions of thousands of cells from each individual within a sizable cohort. This data enables the construction of cell-type- and individual-specific gene co-expression networks by estimating the covariance matrices. Understanding how such co-expression networks are associated with individual-level covariates is crucial. This paper considers Fréchet regression with the covariance matrix as the outcome and vector covariates, using the Wasserstein distance between covariance matrices as a substitute for the Euclidean distance. A test statistic is defined based on the Fréchet mean and covariate-weighted Fréchet mean. The asymptotic distribution of the test statistic is derived under the assumption of simultaneously diagonalizable covariance matrices. Results from an analysis of large-scale single-cell data reveal an association between the co-expression network of genes in the nutrient sensing pathway and age, indicating a perturbation in gene co-expression networks with aging. More general Fréchet regression on the Bures-Wasserstein manifold will also be discussed and applied to the same single-cell RNA-seq data.
Biography:
Dr. Hongzhe Li is Perelman Professor of Biostatistics, Epidemiology and Informatics at the Perelman School of Medicine at the University of Pennsylvania. He is Vice Chair for Research Integration, Director of Center of Statistics in Big Data and former Chair of the Graduate Program in Biostatistics at Penn. He is also a Professor of Statistics and Data Science at the Wharton School and a Professor of Applied Mathematics and Computational Science at Penn. Dr. Li has been elected as a Fellow of the American Statistical Association (ASA), a Fellow of the Institute of Mathematical Statistics (IMS) and a Fellow of American Association for the Advancement of Science (AAAS). Dr. Li served on the Board of Scientific Counselors of the National Cancer Institute of NIH and regularly serves on various NIH study sections. He served as Chair of the Section on Statistics in Genomics and Genetics of the ASA and Co-Editor-in-Chief of Statistics in Biosciences. Dr. Li’s research focuses on developing statistical and computational methods for analysis of large-scale genetic, genomics and metagenomics data and theory on high dimensional statistics. He has trained over 50 PhD students and postdoctoral fellows.
Thursday, 10/10/2024, Time: 11:00am-12:15pm, A power law Hawkes process modelling earthquake occurrences
Location: Haines Hall A25
Boris Baeumer, Professor
Department of Mathematics and Statistics, University of Otago
Abstract:
In order to capture the increased frequency of earth quakes (aftershocks) after a large event we use a Hawkes process model based on the first relaxation eigenmode of a visco-elastic plate model; i.e. we assume the kernel functions of the Hawkes model are Mittag-Leffler functions. Assuming magnitude and frequency being separable leads to a model that for most data bases outperforms Ogata’s ETAS model predicting earthquake frequency. Removing the restrictive assumption that magnitude and frequency are separable, in order to obtain a parsimonious model of the joint process we need to model the impact of an earthquake of a given magnitude on the intensity measures of all earthquakes. We use marked multivariate Hawkes processes to inform the shape of a parsimonious kernel.
Biography:
Boris Baeumer is a Professor in the Department of Mathematics and Statistics at University of Otago, New Zealand. He obtained his PhD in Mathematics from Louisiana State University. His research interests include non-local PDEs and associated stochastic processes. Over his career he obtained several major research grants and is currently (co-)principal investigator researching “Modelling the domino effect in complex systems”.
Thursday, 10/03/2024, Time: 11:00am-12:15pm, Causal inference in network experiments: regression-based analysis and design-based properties
Location: Haines Hall A25
Peng Ding, Associate Professor
Department of Statistics, UC Berkeley
Abstract:
Investigating interference or spillover effects among units is a central task in many social science problems. Network experiments are powerful tools for this task, which avoids endogeneity by randomly assigning treatments to units over networks. However, it is non-trivial to analyze network experiments properly without imposing strong modeling assumptions. Previously, many researchers have proposed sophisticated point estimators and standard errors for causal effects under network experiments. We further show that regression-based point estimators and standard errors can have strong theoretical guarantees if the regression functions and robust standard errors are carefully specified to accommodate the interference patterns under network experiments. We first recall a well-known result that the Hajek estimator is numerically identical to the coefficient from the weighted-least-squares fit based on the inverse probability of the exposure mapping. Moreover, we demonstrate that the regression-based approach offers three notable advantages: its ease of implementation, the ability to derive standard errors through the same weighted-least-squares fit, and the capacity to integrate covariates into the analysis, thereby enhancing estimation efficiency. Furthermore, we analyze the asymptotic bias of the regression-based network-robust standard errors. Recognizing that the covariance estimator can be anti-conservative, we propose an adjusted covariance estimator to improve the empirical coverage rates. Although we focus on regression-based point estimators and standard errors, our theory holds under the design-based framework, which assumes that the randomness comes solely from the design of network experiments and allows for arbitrary misspecification of the regression models.
Biography:
Peng Ding is an Associate Professor in the Department of Statistics at UC Berkeley. He obtained his Ph.D. from the Department of Statistics, Harvard University in May 2015, and worked as a postdoctoral researcher in the Department of Epidemiology, Harvard T. H. Chan School of Public Health until December 2015.