2020 – 2021 Academic Year

Tuesday, 10/20/2020, Time: 11am – 12:15pm PST
Topic: How to incorporate personal densities into predictive models: Pairwise Density Distances, Regularized Kernel Estimation and Smoothing Spline ANOVA models

Prof. Grace Wahba, I.J. Schoenberg-Hilldale Emerita Professor of Statistics
University of Wisconsin-Madison

We are concerned with the use of personal density functions or personal sample densities as subject attributes in prediction and classification models. The situation is particularly interesting when it is desired to combine other attributes with the personal densities in a prediction or classification model.

The procedure is (for each subject) to embed their sample density into a Reproducing Kernel Hilbert Space (RKHS), use this embedding to estimate pairwise distances between densities, use Regularized Kernel Estimation (RKE) with the pairwise distances to embed the subject (training) densities into a Euclidean space, and use the Euclidean coordinates as attributes in a Smoothing Spline ANOVA (SSANOVA) model. Elementary expository introductions to RKHS, RKE and SSANOVA occupy most of this talk.

Dr. Grace Wahba is an American statistician and now-retired I. J. Schoenberg-Hilldale Professor of Statistics at the University of Wisconsin–Madison. She is a pioneer in methods for smoothing noisy data. Best known for the development of generalized cross-validation and “Wahba’s problem,” she has developed methods with applications in demographic studies, machine learning, DNA microarrays, risk modeling, medical imaging, and climate prediction.

Dr. Wahba is a member of the National Academy of Sciences and a fellow of several academic societies including the American Academy of Arts and Sciences, the American Association for the Advancement of Science, the American Statistical Association, and the Institute of Mathematical Statistics. Over the years she has received a selection of notable awards in the statistics community:
– R. A. Fisher Lectureship, COPSS, August 2014
– Gottfried E. Noether Senior Researcher Award, Joint Statistics Meetings, August 2009
– Committee of Presidents of Statistical Societies Elizabeth Scott Award, 1996
– First Emanuel and Carol Parzen Prize for Statistical Innovation, 1994

Tuesday, 10/13/2020, Time: 11am – 12:15pm PST
Reflections on Breiman’s Two Cultures of Statistical Modeling & An updated dynamic Bayesian forecasting model for the 2020 election

Andrew Gelman, Professor of Statistics
Columbia University

Abstract: In this talk, Dr. Gelman will talk about two papers:

Reflections on Breiman’s Two Cultures of Statistical Modeling

In an influential paper from 2001, the statistician Leo Breiman distinguished between two cultures in statistical modeling: “One assumes that the data are generated by a given stochastic data model. The other uses algorithmic models and treats the data mechanism as unknown.” Breiman’s “two cultures” article deserves its fame: it includes many interesting real-world examples and an empirical perspective which is a breath of fresh air compared to the usual standard approach of statistics papers at that time, which was a mix of definitions, theorems, and simulation studies showing the coverage of nominal 95% confidence intervals.

An updated dynamic Bayesian forecasting model for the 2020 election

We constructed an election forecasting model for the Economist that builds on Linzer’s (2013) dynamic Bayesian forecasting model and provides an election day forecast by partially pooling two separate predictions: (1) a forecast based on historically relevant economic and political factors such as personal income growth, presidential approval, and incumbency; and (2) information from state and national polls during the election season. The two sources of information are combined using a time-series model for state and national opinion. Our model also accounts for some aspects of non-sampling errors in polling. The model is fit using the open-source statistics packages R and Stan (R Core Team, 2020; Stan Development Team, 2020) and is updated every day with new polls. The forecast is available at https://projects.economist.com/us-2020-forecast/president, a description of the model-building process is at https://projects.economist.com/us-2020-forecast/president/how-this-works, and all code is at https://github.com/TheEconomist/us-potus-model.

Dr. Andrew Gelman is a professor of statistics and political science at Columbia University. He has received the Outstanding Statistical Application award three times from the American Statistical Association, the award for best article published in the American Political Science Review, and the Council of Presidents of Statistical Societies award for outstanding contributions by a person under the age of 40. His books include Bayesian Data Analysis (with John Carlin, Hal Stern, David Dunson, Aki Vehtari, and Don Rubin), Teaching Statistics: A Bag of Tricks (with Deb Nolan), Data Analysis Using Regression and Multilevel/Hierarchical Models (with Jennifer Hill), Red State, Blue State, Rich State, Poor State: Why Americans Vote the Way They Do (with David Park, Boris Shor, and Jeronimo Cortina), A Quantitative Tour of the Social Sciences (co-edited with Jeronimo Cortina), and Regression and Other Stories (with Jennifer Hill and Aki Vehtari). Dr. Gelman has done research on a wide range of topics, including: why it is rational to vote; why campaign polls are so variable when elections are so predictable; why redistricting is good for democracy; reversals of death sentences; police stops in New York City, the statistical challenges of estimating small effects; the probability that your vote will be decisive; seats and votes in Congress; social network structure; arsenic in Bangladesh; radon in your basement; toxicology; medical imaging; and methods in surveys, experimental design, statistical inference, computation, and graphics.

Tuesday, 10/06/2020, Time: 11am – 12:00pm
Testing for a Change in Mean after Changepoint Detection

Prof. Paul Fearnhead, Distinguished Professor of Statistics
Lancaster University

While many methods are available to detect structural changes in a time series, few procedures are available to quantify the uncertainty of these estimates post-detection. In this work, we fill this gap by proposing a new framework to test the null hypothesis that there is no change in mean around an estimated changepoint. We further show that it is possible to efficiently carry out this framework in the case of changepoints estimated by binary segmentation, variants of binary segmentation, segmentation, or the fused lasso. Our setup allows us to condition on much smaller selection events than existing approaches, which yields higher powered tests. Our procedure leads to improved power in simulation and additional discoveries in a dataset of chromosomal guanine-cytosine content. Our new changepoint inference procedures are freely available in the R package ChangepointInference. This is joint work with Sean Jewell and Daniela Witten.

Related reading:
This is based on the paper: https://arxiv.org/pdf/1910.04291 and motivated by earlier work: https://academic.oup.com/biostatistics/advance-article-abstract/doi/10.1093/biostatistics/kxy083/5310127

Dr. Paul Fearnhead is Distinguished Professor of Statistics at Lancaster University. He is a researcher in computational statistics, in particular Sequential Monte Carlo methods. His interests include sampling theory and genetics – he has published several papers working on the epidemiology of campylobacter by looking at recombination events in a large sample of genomes. Since January 2018 he has been the editor of Biometrika. He won the Adams Prize and the Guy Medal in Bronze of the Royal Statistical Society in 2007.

Tuesday, 09/29/2020, Time: 11am – 12:15pm
Individual-centered Partial Information in Social Networks

Prof. Xin Tong, Assistant Professor
Data Sciences and Operations
University of Southern California

Most existing statistical network analysis literature assumes a global view of the network, under which community detection, testing, and other statistical procedures are developed. Yet in the real world, people frequently make decisions based on their partial understanding of the network information. As individuals barely know beyond friends’ friends, we assume that an individual of interest knows all paths of length up to L = 2 that originate from her. As a result, this individual’s perceived adjacency matrix B differs significantly from the usual adjacency matrix A based on the global information. The new individual-centered partial information framework sparks an array of interesting endeavors from theory to practice. Key general properties on the eigenvalues and eigenvectors of BE , a major term of B, are derived. These general results, coupled with the classic stochastic block model, lead to a new theory-backed spectral approach to detect the community memberships based on an anchored individual’s partial information. Real data analysis delivers interesting insights that cannot be obtained from global network analysis.

Dr. Xin Tong is an assistant professor at the Department of Data Sciences and Operations, University of Southern California. His research focuses on asymmetric supervised and unsupervised learning, high-dimensional statistics, and network-related problems. He is an associate editor for Journal of the American Statistical Association and Journal of Business and Economic Statistics. Before joining the current position, he was an instructor of statistics in the Department of Mathematics at the Massachusetts Institute of Technology. He obtained his Ph.D. in Operations Research from Princeton University.

Thursday, 09/24/2020, Time: 11am – 12:15pm
Active Learning and Projects with Engaging Contexts: Keys to Successful Teaching of Applied Statistics Face-to-Face and Remotely

Mahtash Esfandiari, Senior Lecturer
Stephanie Stacy, Graduate Student
Samuel Baugh, Graduate Student
UCLA Department of Statistics

The objective of this presentation is to: 1) elaborate a theoretical model that underlies successful teaching of applied statistics face-to-face and remotely, 2) explain the teaching and learning, strategies that helped us reach the objectives underlying the proposed model, and 3) describe the evaluation strategies we used to assess the extent to which we reached our objectives in face-to-face and remote teaching. Sample projects as well as with quantitative and qualitative findings will be presented from courses on “Introduction to Statistical Consulting” and “Linear Models” to elaborate the application of the theoretical model proposed to teaching of applied statistics pre-pandemic face-to-face and post-pandemic on zoom.