2023 – 2024 Acad. Year

Thursday, 11/30/2023, Time: 11:00am – 12:00pm, Contained Chaos: Quality Assurance for the Community Earth System Model

This seminar is part of the Joint Statistics and Biostatistics Seminar Series.

Dorit Hammerling, Associate Professor
Department of Applied Mathematics and Statistics, Colorado School of Mines

Location: online

Abstract:

State-of-the-science climate models are valuable tools for understanding past and present climates and are particularly vital for addressing otherwise intractable questions about future climate scenarios. The National Center for Atmospheric research leads the development of the popular Community Earth System Model (CESM), which models the Earth system by simulating the major Earth system components (e.g., atmosphere, ocean, land, river, ice, etc.) and the interactions between them. These complex processes result in a model that is inherently chaotic, meaning that small perturbations can cause large effects. For this reason, ensemble methods are common in climate studies, as a collection of simulations are needed to understand and characterize this uncertainty in the climate model system. While climate scientists typically use initial condition perturbations to create ensemble spread, similar effects can result from seemingly minor changes to the hardware or software stack. This sensitivity makes quality assurance challenging, and defining “correctness” separately from bit-reproducibility is really a practical necessity. Our approach casts correctness in terms of statistical distinguishability such that the problem becomes one of making decisions under uncertainty in a high-dimensional variable space. We developed a statistical testing framework that can be thought of as hypothesis testing combined with Principal Component Analysis (PCA). One key advantage of this approach for settings with hundreds of output variables is that it not only captures changes in individual variables but the relationship between variables as well. This testing framework referred to as “Ensemble Consistency Testing” has been successfully implemented and used for the last few years, and we will provide an overview of this multi-year effort and highlight ongoing developments. Two good background papers for this talk are:

Bio:

Dr. Hammerling obtained a M.A. and PhD (2012) from the University of Michigan in Statistics and Engineering, followed by a post-doctoral fellowship at the Statistical Applied Mathematical Sciences Institute in the program for Statistical Inference for massive data. She then joined the National Center for Atmospheric Research, where she led the statistics group within the Institute for Mathematics Applied to the Geosciences and worked in the Machine Learning division before becoming an Associate Professor in Applied Mathematics and Statistics at the Colorado School of Mines in January 2019.

Wednesday, 11/29/2023, Time: 3:30pm-4:45pm, Individualized Dynamic Model for Multi-resolutional Data

This seminar is part of the Joint Statistics and Biostatistics Seminar Series.

Annie Qu, Chancellor’s Professor
Department of Statistics, University of California – Irvine

Location: 43-105 CHS Building

Abstract:

Mobile health has emerged as a major success in tracking individual health status, due to. the popularity and power of smartphones and wearable devices. This has also brought great challenges in handling heterogeneous, multi-resolution data which arise ubiquitously in mobile health due to irregular multivariate measurements collected from individuals. In this talk, we propose an individualized dynamic latent factor model for irregular multi-resolution time series data to interpolate unsampled measurements of time series with low resolution. One major advantage of the proposed method is the capability to integrate multiple irregular time series and multiple subjects by mapping the multi-resolution data to the latent space. In addition, the proposed individualized dynamic latent factor model is applicable to capturing heterogeneous longitudinal information through individualized dynamic latent factors. In theory, we provide the interpolation error bound of the proposed estimator and derive the convergence rate with non-parametric approximation methods. Both the simulation studies and the application to smartwatch data demonstrate the superior performance of the proposed method compared to existing methods.

Bio:

Professor Qu’s research focuses on solving fundamental issues regarding structured and unstructured large-scale data, and developing cutting-edge statistical methods and theory in machine learning and algorithms on personalized medicine, text mining, recommender systems, medical imaging data and network data analyses for complex heterogeneous data.

Qu received her Ph.D. in Statistics from the Pennsylvania State University. Before she joined the UC Irvine, Professor Qu was Data Science Founder Professor of Statistics, and the Director of the Illinois Statistics Office at the University of Illinois at Urbana-Champaign. She was awarded a Brad and Karen Smith Professorial Scholar by the College of LAS at UIUC, a recipient of the NSF Career award in 2004-2009. She is a Fellow of the Institute of Mathematical Statistics, a Fellow of the American Statistical Association, and a Fellow of American Association for the Advancement of Science. She is also a recipient of 2024 IMS Medallion Award and Lecturer. She is JASA Theory and Methods co-editor in 2023-2025.

Thursday, 11/02/2023, Time: 11:00am – 12:00pm, Introduction to the Research Areas of the Faculty (Part 2)

The Faculty of the Department of Statistics and Data Science

Location: Haines Hall A25

The purpose of this seminar is to give an overview of the current research areas of the faculty in the Department of Statistics and Data Science. Each of the faculty will give a flash talk appetizing their research and how you can get involved in it. They will focus on those areas which have open problems suitable for Ph.D. dissertations, M.S. or M.A.S. theses. The faculty presenting will be those who did not do so last week.

This is a great way to get a sense of what the faculty do as individuals and what the department does as a whole. There will be time for questions and they are encouraged.

We already have videos and information on the faculty research here. It can be previewed before the meeting.

Thursday, 10/26/2023, Time: 11:00am – 12:00pm, Introduction to the Research Areas of the Faculty (Part 1)

The Faculty of the Department of Statistics and Data Science

Location: Haines Hall A25

The purpose of this seminar is to give an overview of the current research areas of the faculty in the Department of Statistics and Data Science. Each of the faculty will give a flash talk appetizing their research and how you can get involved in it. They will focus on those areas which have open problems suitable for Ph.D. dissertations, M.S. or M.A.S. theses.

This is a great way to get a sense of what the faculty do as individuals and what the department does as a whole. There will be time for questions and they are encouraged.

We already have videos and information on the faculty research here. It can be previewed before the meeting.

Thursday, 10/19/2023, Time: 11:00am – 12:15pm, An Integrative Approach to Understanding the Brain Computation: Challenges, Opportunities and Methodologies

Paul Bogdan, Associate Professor
Electrical and Computer Engineering, University of Southern California

Location: Haines Hall A25

Abstract:

Brains build compact models of the world from just a few noisy and conflicting observations. They predict events via memory-based analogies even when resources are limited. The ability of biological intelligence to generalize and complete a wide range of unknown heterogeneous tasks calls for a comprehensive understanding of how networks of interactions among neurons, glia, and vascular systems enable human cognition. This will serve as a basis for advancing the design of artificial general intelligence (AGI). In this talk, we introduce a series of novel mathematical tools which can help us reconstruct networks among neurons, infer their objectives, and identify their learning rules. To decode the network structure from very scarce and noisy data, we develop the first mathematical framework which identifies the emerging causal fractal memory phenomenon in the spike trains and the neural network topologies. We show that the fractional order operators governing the neuronal spiking dynamics provide insight into the topological properties of the underlying neuronal networks and improve the prediction of animal behavior during cognitive tasks. In addition to this, we propose a variational expectation-maximization approach to mine the optical imaging of brain activity and reconstruct the neuronal network generator, namely the weighted multifractal graph generator. Our proposed network generator inference framework can reproduce network properties, differentiate varying structures in the brain networks and chromosomal interactions, and detect topologically associating domain regions in conformation maps of the human genome. Moreover, we develop a multiwavelet-based neural operator in order to infer the objectives and learning rules of complex biological systems. We thus learn the operator kernel of an unknown partial differential equation (PDE) from noisy scarce data. For time-varying PDEs, this model exhibits 2-10X higher accuracy than state-of-the-art machine learning tools.

Bio:

Paul Bogdan is the Jack Munushian Early Career Chair and Associate Professor in the Ming Hsieh Department of Electrical and Computer Engineering at University of Southern California. He received his Ph.D. degree in Electrical & Computer Engineering from Carnegie Mellon University. His work has been recognized with a number of honors and distinctions, including the 2021 DoD Trusted Artificial Intelligence (TAI) Challenge award, the USC Stevens Center 2021 Technology Advancement Award for the AI framework for SARS-CoV-2 vaccine design, the 2019 Defense Advanced Research Projects Agency (DARPA) Director’s Fellowship award, the 2018 IEEE CEDA Ernest S. Kuh Early Career Award, the 2017 DARPA Young Faculty Award, the 2017 Okawa Foundation Award, the 2015 National Science Foundation (NSF) CAREER award, the 2012 A.G. Jordan Award from Carnegie Mellon University for an outstanding Ph.D. thesis and service, and several best paper awards. His research interests include cyber-physical systems, new computational cognitive neuroscience tools for deciphering biological intelligence, the quantification of the degree of trustworthiness and self-optimization of AI systems, new machine learning techniques for complex multi-modal data, the control of complex time-varying networks, the modeling and analysis of biological systems and swarms, new control techniques for dynamical systems exhibiting multi-fractal characteristics, performance analysis and design methodologies for heterogeneous manycore systems.

Thursday, 10/12/2023, Time: 11:00am – 12:15pm, Representation-based Reinforcement Learning

Bo Dai, Assistant Professor / Staff Research Scientist
Georgia Tech and Google DeepMind

Location: Haines Hall A25

Abstract:

The majority reinforcement learning (RL) algorithms are largely categorized as model-free and model-based through whether a simulation model is used in the algorithm. However, both of these two categories have their own issues, especially incorporating with function approximation: the exploration with arbitrary function approximation in model-free RL algorithms is difficult, while optimal planning becomes intractable in model-based RL algorithms with neural simulators. In this talk, I will present our recent work on exploiting the power of representation in RL to bypass these difficulties. Specifically, we designed practical algorithms for extracting useful representations, with the goal of improving statistical and computational efficiency in exploration vs. exploitation tradeoff and empirical performance in RL. We provide rigorous theoretical analysis of our algorithm, and demonstrate the practical superior performance over the existing state-of-the-art empirical algorithms on several benchmarks.

Bio:

Bo Dai is an assistant professor in Georgia Tech and a staff research scientist in Google DeepMind. He obtained his Ph.D. from Georgia Tech. His research interest lies in developing principled and practical machine learning methods for Decision AI, including reinforcement learning. He is the recipient of the best paper award of AISTATS and NeurIPS workshop. He regularly serves as area chair or senior program committee member at major AI/ML conferences such as ICML, NeurIPS, AISTATS, and ICLR.

Thursday, 10/05/2023, Time: 11:00am – 12:15pm, Comparing Climate Time Series

Tim DelSole, Professor
Department of Atmospheric, Oceanic, and Earth Sciences, George Mason University

Location: Haines Hall A25

Abstract:

In climate science, two questions arise repeatedly: (1) Has climate variability changed over time? (2) Do climate models accurately represent reality? Answering these questions requires a procedure for deciding if two data sets might have originated from the same source. While numerous statistical methods exist for comparing two data sets, most of these methods do not adequately consider spatial and temporal correlations and possible non-stationary signals in a comprehensive test. In this talk, I discuss a method that fills this gap. The basic idea is to assume that each data set comes from a vector autoregressive model. This model can capture typical spatial and temporal correlations in climate data in a parsimonious manner. Furthermore, non-stationary signals can be captured by adding suitable forcing terms. Then, deciding if two data sets came from the same source reduces to deciding if two autoregressive models share the same parameters. A decision rule and associated significance test is derived from the likelihood ratio method. In this talk, I discuss this procedure and additional procedures for isolating the source of any discrepancies. This procedure is applied to assess the realism of climate model simulations of North Atlantic Sea Surface Temperatures. According to this test, every climate model differs stochastically from observations, and differs from every other climate model, except when they originate from the same modeling center. In fact, differences among climate models are distinctive enough to serve as a fingerprint that differentiates a given model from both observations and any other model.

Bio:

Tim DelSole is a statistical climate scientist who studies the extent to which future climate changes can be predicted on time scales from weeks to years. he is also a senior research scientist at the Center for Ocean-Land-Atmosphere Studies. He serves as co-Chief Editor of the Journal of Climate. After completing my doctorate in 1993 from Harvard University, he became a Global Change Distinguished Postdoctoral Fellow for two years and a National Research Council Associate for two years at the NASA Goddard Space Flight Center. In 1997, he joined the GMU Center for Ocean-Land-Atmosphere Studies.