Tuesday, 06/05/2018, Time: 2:00PMStatistics Weekly SeminarEasily Parallelizable and Distributable Algorithms for High-dimensional Optimization in Statistical Computing

Location: Haines A25
Joong-Ho (Johann) Won, Associate Professor
Department of Statistics, Seoul National University

This talk consists of two parts. The first part concerns minimization of a sum of two convex functions, one typically a composition of non-smooth and linear functions. This formulation prevails many statistical learning problems, including regression under structured sparsity assumptions. Popular algorithms for solving such problems, e.g., ADMM, often involve non-trivial optimization subproblems or smoothing approximation. We propose a continuum of preconditioned forward-backward operator splitting algorithms that do not incur these difficulties, hence amenable to parallel and distributed computing. We further consider acceleration of the continuum and show that it attains theoretically optimal rate of convergence for the entire continuum. The scalability of the proposed algorithms is demonstrated up to 1.2 million variables with a multi-GPU implementation. Inspired by the success of the first part, we consider various statistical computing problems solvable on modern high-performance computing environment in a scalable fashion. Taking model problems from Zhou, Lange and Suchard (2010), we show that problems such as nonnegative matrix factorization, multidimensional scaling, and PET image reconstruction can be tackled in a much larger scale with much less programming efforts than in 2010.

Tuesday, 05/29/2018, Time: 2:00PMStatistics Weekly SeminarBrief Presentations by UCLA Statistics Faculty

Location: Haines A25
Various faculty members from UCLA Statistics

There will be brief presentations by our own faculty. They will talk about their recent and current work. The faculty participating will be: Tao Gao, Guido Montufar, Mark Handcock, Mahtash Esfandiari (with Noreen Webb from UCLA Education), Rick Schoenberg, and Hongquan Xu

Tuesday, 05/22/2018, Time: 2:00PMStatistics Weekly SeminarIntentional Control of Type I Error Over Unconscious Data Distortion: A Neyman-Pearson Approach

Location: Haines A25
Yanhui Wu, USC Marshall School of Business

The rise of social media enables millions of citizens to generate information on sensitive political issues and social events, which is scarce in authoritarian countries and is tremendously valuable for surveillance and social studies. In the enormous efforts to utilize social media information, censorship stands as a formidable obstacle for informative description and accurate statistical inference. Likewise, in medical research, disease type proportions in the samples might not represent the proportions in the general population. To solve the information distortion problem caused by unconscious data distortion, such as non-predictable censorship and non representative sampling, we propose a new distortion-invariant statistical approach to parse data, based on the Neyman-Pearson (NP) classification paradigm. Under general conditions, we derive explicit formulas for the after-distortion oracle classifier with explicit dependency on the distortion rates β0 and β1 on Class 0 and Class 1 respectively, and show that the NP oracle classifier is independent of the distortion scheme. We illustrate the working of this new method by combining the recently developed NP umbrella algorithm with topic modeling to automatically detect posts that are related to strikes and corruption in samples of randomly selected posts extracted from Sina Weibo – the Chinese equivalent to Twitter. In situations where type I errors are unacceptably large under the classical classification framework, the use of our proposed approach allows for controlling type I errors under a desirable upper bound.

Tuesday, 05/15/2018, Time: 2:00PMStatistics Weekly SeminarTitle: Current Topics in Hydrology of Possible Statistical Interest

Location: Haines A25
Dennis Lettenmaier, UCLA Department of Geography

The field of hydrology has strong roots in statistics, due largely to its heritage in water resources interest, and hence risk analysis associated with planning of water infrastructure (e.g., for too little or too much water). Some of that connection was lost with the move some 30 years ago to redefine hydrology as an element of Earth sciences. However, interest in hydrologic change over the last 10 years ago as articulated in the 2008 “Stationarity is dead” paper has cast some old problems in a new light. Here, I give an overview of four topics of current interest with statistical underpinnings.

  1. “If extreme precipitation is increasing, why aren’t floods?”
  2. “What methods are most appropriate for estimating nonstationary extremes and their attribution?”
  3. “What can we learn about sensitivity of runoff to precipitation and evaporative demand from historical streamflow and climate records?”
  4. “How best can we make use of climate model projections in hydrologic predictions?”

Tuesday, 05/08/2018, Time: 2:00PMStatistics Weekly SeminarTitle: From Conduction to Induction: How Temperature Affects Delivery Date and Gestational Lengths in the United States

Location: Haines A25
Alan Barreca, UCLA Institute of the Environment and Sustainability

I investigate how ambient temperature affects delivery date and gestational lengths in the United States. The data come from the Vital Statistics Natality files for the years 1968 through 1988, when date of birth and county of residence are available in the data. Weather is assigned at the county-day level. The empirical model exploits unusual changes in temperatures to identify impacts so as not to be confounded by selection related to season-of-birth preferences. There is a 5% increase in births on days where the mean temperature exceeds 80F and a 2% increase in births on the next day. I observe an equivalent decline in births spread over the following two weeks, suggesting that hot weather causes a forward shift in delivery date of up to two weeks. For those observations where gestational length is observed, I find that births are mostly vulnerable to temperature at 38 weeks of gestation and beyond. There is little apparent effect on preterm delivery risk (<37 weeks). I also consider the implications of these findings with regards to climate change.

Tuesday, 05/01/2018, Time: 2:00PMStatistics Weekly Seminar

Location: Haines A25
Zackary Steinert-Threlkheld, UCLA Luskin School of Public Affairs

The difficulty measuring the effects of repression on dissent, and vice-versa, may be due to the high variance of outcomes generated through the diffusion of information on interpersonal (social) networks. Fat tails dominate the distribution of connections in social networks and the size of protests, and this paper creates a model that connects those facts. A series of simulations varies the size and frequency of mass mobilization as a function of the distribution of connections and the level of repression on a net- work, generating two primary, unintuitive results. First, mass mobilization does not depend on the density or clustering of the network of the initial protesters. Second, there are several nonlinear relationships between the scaling of connections and the scaling of protests. The most realistic results occur for social networks that are neither dominated by well-connected or poorly connected individuals. The correlation between the distribution of connections in the network, the network structure of the initial protesters, and the size of subsequent protest also changes nonlinearly. After presenting the model and its results, we discuss how it explains why states repress all protests, the forms of that repression, and suggests an answer to the repression-dissent puzzle.

Thursday, 4/26/2018, 1:30PM – 3:30PMde Leeuw Seminar: Examples of MM Algorithms

Location: UCLA Faculty Center, California Room

Kenneth Lange, Professor
UCLA Departments of Biomathematics, Human Genetics, and Statistics

This talk will survey the history, theory, and applications of the MM principle introduced to statistics by Jan de Leeuw. The MM principle furnishes a framework for constructing monotone optimization algorithms in high-dimensional models. The MM principle transfers optimization from the objective function to a surrogate function and simplifies matters by: (a) separating the variables of a problem, (b) avoiding large matrix inversions, (c) linearizing a problem, (d) restoring symmetry, (e) dealing gracefully with equality and inequality constraints, and (f) turning a non-differentiable problem into a smooth problem. The art in devising an MM algorithm lies in choosing a tractable surrogate function g( x | xn ) that hugs the objective function f(x) as tightly possible. The EM principle from statistics is a special case of the MM principle. Modern mathematical themes such as sparsity, constraint satisfaction, and parallelization mesh well with the MM principle. Sample applications will include robust regression, k-means clustering, averaged and alternating projections, split feasibility algorithms, and sparse eigenvalue estimation.

More information is available here.

Tuesday, 04/24/2018, Time: 2:00PMStatistics Weekly SeminarNovel time-varying effects modeling and applications

Location: Haines A25
Damla Senturk, UCLA Biostatistics

Two projects on time-varying effects modeling with applications to brain imaging and nephrology will be discussed. The first part of the talk will be on multidimensional modeling of electroencephalography (EEG) data. EEG data created in event-related potential (ERP) experiments have a complex high-dimensional structure. Each stimulus presentation, or trial, generates an ERP waveform which is an instance of functional data. The experiments are made up of sequences of multiple trials, resulting in longitudinal functional data and moreover, responses are recorded at multiple electrodes on the scalp, adding an electrode dimension. Traditional EEG analyses involve multiple simplifications of this structure to increase the signal-to-noise ratio, effectively collapsing the functional and longitudinal components by identifying key features of the ERPs and averaging them across trials. Motivated by an implicit learning paradigm used in autism research in which the functional, longitudinal and electrode components all have critical interpretations, we propose multidimensional functional principal components techniques which do not collapse any of the dimensions of the ERP data. The proposed methods are shown to be useful for modeling longitudinal trends in the ERP functions, leading to novel insights into the learning patterns of children with Autism Spectrum Disorder (ASD) and their typically developing peers.

The second part of the talk will be on time-dynamic profiling with application to hospital readmission among patients on dialysis. Standard profiling analysis aims to evaluate medical providers, such as hospitals, nursing homes or dialysis facilities, with respect to a patient outcome. Profiling methods exist mostly for non time-varying patient outcomes. However, for patients on dialysis, a unique population which requires continuous medical care, methodologies to monitor patient outcomes continuously over time are particularly relevant. Thus, we introduce a novel time-dynamic profiling (TDP) approach to assess the time-varying 30-day hospital readmission rate, throughout the time period that patients are on dialysis. We develop the framework for TDP by introducing the standardized dynamic readmission ratio as a function of time and a multilevel varying coefficient model with facility-specific time-varying effects. We propose estimation and inference procedures tailored to the problem of TDP and to overcome the challenge of high-dimensional parameters when examining thousands of dialysis facilities.

Tuesday, 04/17/2018, Time: 2:00PMStatistics Weekly SeminarHypernetworks: Generating Neural Network Weights with Neural Networks

Location: Haines A25
Lior Deutsch

A hypernetwork is a neural networks that transforms a random input into the weights for another neural network – the target network. Hypernetworks can be used to generate ensembles of target networks, and can also be used as a tool to probe the “space” of trained target networks. In this talk I will present a general formulation for the objective function of hypernetworks, and explain how this objective generalizes previous work on Bayesian hypernetworks. I will describe how to train the hypernetwork to generate a diverse set of weights, while taking into account symmetries of the target network. I will show how to use parameter sharing to reduce the size of the hypernetwork. I will present results that demonstrate that the hypernetwork can generate diverse weights on non-trivial manifolds.

Tuesday, 04/10/2018, Time: 2:00PMStatistics Weekly Seminar: Projects from the 141SL Course

Location: 2258A Franz Hall
Dr. Daniel Rootman, UCLA Jules Stein Eye Center and teams from 141SL course

We will have talks by two teams from the 141SL course: one on “Risk factors of ptosis” and one on “Eyebrow asymmetry issues before and after eyelid surgery”. Both projects involved consultations with Dr. Daniel Rootman of the Jules Stein Eye center, who will also join to introduce the projects.

Tuesday, 03/13/2018, Time: 2:00PM – 3:15PMTitle: Rules of Engagement in Evidence-Informed Policy: Practices and Norms of Statistical Science in Government

Location: Franz Hall 2258A
Jake Bowers, Associate Professor at University of Illinois and Fellow of the Office of Evaluation Sciences

Collaboration between statistical scientists (data scientists, behavioral and social scientists, statisticians) and policy makers promises to improve government and the lives of the public. And the data and design challenges arising from governments offer academics new chances to improve our understanding of both extant methods and behavioral and social science theory. However, the practices that ensure the integrity of statistical work in the academy — such as transparent sharing of data and code — do not translate neatly or directly into work with governmental data and for policy ends. This paper proposes a set of practices and norms that academics and practitioners can agree on before launching a partnership so that science can advance and the public can be protected while policy can be improved. This work is at an early stage. The aim is a checklist or statement of principles or memo of understanding that can be a template for the wide variety of ways that statistical scientists collaborate with governmental actors.

Monday, 02/26/2018, Time: 2:00PM – 3:15PMStatistics Weekly SeminarModeling and Predicting Regional Climate Variability

Location: 4242 Young Hall
Karen McKinnon, Applied Scientist at Descartes Labs

As the climate changes, it is imperative that we improve our understanding of regional temperature and precipitation variability. To do so, it is necessary to merge insights from observations, dynamical models, and statistical models. In this talk, I will take this combined approach to answer questions about whether temperature variability is increasing, the influence of internal variability on climate signals, and links between climatic boundary conditions and extreme weather events. In particular, I will first present a method to concisely quantify trends in non-normal distributions, and apply it to daily summer temperature across the Northern Hemisphere. The vast majority of the observed changes in temperature can be explained by a shift in distributions without changes in shape. Trends in the observations, however, are reflective of both anthropogenic forcing and internal variability. Initial condition ensembles from climate models have illuminated the large influence of internal variability on perceived climate signals, but they suffer from biases that can limit their use for regional climate studies. As a complementary tool, I will present the Observational Large Ensemble (OLENS), which is based on a statistical model fit to the observations. The OLENS allows for more realistic simulation of internal variability that I will discuss in the context of climate change and the El Nino-Southern Oscillation. Finally, I will explore the use of statistical models to predict high-impact summer heat waves based on climatic boundary conditions. Both the land and the sea surfaces can be used to provide skillful predictions of Eastern US heat waves up to seven weeks in advance. Analysis of long control runs from fully-coupled and fixed-ocean climate model simulations suggests that the ocean does not play an active role in causing the heat waves.

Dr. McKinnon received her PhD in Earth and Planetary Sciences with a secondary field in Computational Science and Engineering in 2015 from Harvard. Her dissertation work focused on developing a hierarchy of models for the continuum of temperature variability from seasonal to decadal timescales, and assessing the use of climatic boundary conditions to predict extreme events. She then joined the Climate Analysis Section at the National Center for Atmospheric Research as an Advanced Study Program post-doctoral fellow where she worked on merging observational and model insights about internal variability. She is presently an Applied Scientist at Descartes Labs using machine learning methods to build models from remote sensing and weather data for applications such as crop forecasting and water management.

Friday, 02/23/2018, 2:00PM – 3:00PMDesigning a “Cyber-Based” Article Bank to Enhance Statistics Education

Location: 1100 Terasaki Life Sciences Building (CEILS Journal Club)
Mahtash Esfandiari, Senior Continuing Lecturer at UCLA Statistics

Dr. Esfandiari will share the results of a study she conducted on using in-person versus virtual office hours when teaching statistics in a presentation titled, “Designing a “Cyber-Based” Article Bank to Enhance Statistics Education.”

Friday, 02/23/2018, Time: 2:00PM – 3:15PMIoES / Statistics Faculty Search Job Talk

Location: 4242 Young Hall
Daniela Castro Camilo, Postdoctoral Fellow at King Abdullah University of Science and Technology

Abstract: TBA

Wednesday, 02/21/2018, 12:00PM – 1:30PMHeterogeneous Causal Effects: A Propensity Score Approach

Location: 4240 Public Affairs Building (Joint with CCPR and CSS)
Yu Xie, Professor at Princeton University

Heterogeneity is ubiquitous in social science. Individuals differ not only in background characteristics, but also in how they respond to a particular treatment. In this presentation, Yu Xie argues that a useful approach to studying heterogeneous causal effects is through the use of the propensity score. He demonstrates the use of the propensity score approach in three scenarios: when ignorability is true, when treatment is randomly assigned, and when ignorability is not true but there are valid instrumental variables.

Tuesday, 02/20/2018, 2:00PM – 3:15PMIoES / Statistics Faculty Search Job TalkCharacterizing and Quantifying Benefits of Mitigation and Avoided Impacts – Challenges and Opportunities

Location: Franz Hall 2258A
Claudia Tebaldi, Project Scientist at National Center for Atmospheric Research

The research topic of avoided impacts or, alternatively framed, benefits of mitigation – i.e., the comparison of outcomes across alternative possible climate futures — is a compelling framework for the analysis of expected future outcomes from climatic changes. It offers a fertile ground for analysis ranging from physical climate impacts to their repercussions on natural and human systems (e.g., heat and precipitation extremes and their effects on human well-being, agricultural yield changes and food security, tropical cyclone changes in intensity or frequency and ensuing damages on infrastructures). Furthermore, the necessity of quantifying differential impacts opens up interesting statistical questions, from basic signal-to-noise characterizations, to the construction of econometric models to determine dose-response functions for the system impacted.

Over the last couple of years, I have been contributing to an activity at NCAR framed in terms of the characterization of “Benefits of Reduced Anthropogenic Climate change”: BRACE. We have relied on a set of large initial condition ensembles run with NCAR-DOE’s earth system model, CESM. The different ensembles explore climate outcomes under several scenarios, some of them newly designed low-warming pathways, addressing specifically the Paris targets of 1.5C and 2.0C global warming above preindustrial. I will present some examples from the analyses that I conducted, in collaboration with colleagues within and outside of NCAR, specifically about the analysis of changes in heat extremes under different scenarios utilizing extreme value statistics, and the quantification of expected changes in global crop yields from warming temperatures utilizing an empirical model of the relation between weather and yield shocks. These analyses span a wide range of the methodological challenges that uncertainty quantification encounters, and offer perspectives on exciting future work.

Claudia Tebaldi has been a project scientist in the Climate Change Research section of the Climate and Global Dynamics laboratory at NCAR since October 2013 and she is Senior Science Advisor at Climate Central Inc. She holds a Ph.D. in Statistics from Duke University, and she was a postdoc and then a project scientist at NCAR from 1997 to 2007. From 2008 to 2013 she worked as a research scientist for Climate Central, a research and communication organization. Her research focuses on the analysis and statistical characterization of climate change projections and their uncertainty, as derived from climate models, extending from the impacts on the physical climate system, with particular interest in the characterization of changes in extremes, to impacts on human and natural system, like agricultural yields, water resources, health. She is also interested in the detection of observed changes and their attribution to anthropogenic influences. She was a lead author in Working Group 1 of the Intergovernmental Panel on Climate Change, Fifth Assessment Report, Chapter 12, Long Term Projections, Commitments and Irreversibility, and is a co-chair of the World Climate Research Program ScenarioMIP group, responsible for the experimental design of the forthcoming Coupled Model Intercomparison Project, Phase 6, experiments exploring future scenarios, and a member of the Scientific Steering Committee of DAMIP, a similar group responsible for organizing experiments relevant to the Detection and Attribution research community. She is also member of the Scientific Steering Committee of the International ad-hoc Detection and Attribution Group.

Tuesday, 02/06/2018, 2:00PM – 3:30PMModelling Mobility Tables as Weighted Networks

Location: Franz Hall 2258A
Per Block, Chair of Social Networks at ETH Zurich

Contemporary research on occupational mobility, i.e. how people move between jobs, tends to view mobility as being mostly determined by individual and occupational characteristics. These studies focus on people’s sex, ethnicity, age, education or class origin and how they get access to jobs of different wages, working conditions, desirability, skill profiles and job security. Consequently, observations in occupational mobility tables are understood as independent of one another, which allows the use of a variety of well-developed statistical models. As opposed to these “classical” approaches focussed on individual and occupational characteristics, I am interested in modelling and understanding endogenously emerging patterns in occupational mobility tables. These emergent patterns arise from the social embedding of occupational choices, when occupational transitions of different individuals influence each other. To analyse these emergent patterns, I conceptualise a disaggregated mobility table as a network in which occupations are the nodes and connections are made of individuals transitioning between occupations. In this paper, I present a statistical model to analyse these weighted mobility networks. The approach to modelling mobility as an interdependent system is inspired by the exponential random graph model (ERGM); however, some differences arise from ties being weighted as well as from specific constraints of mobility tables. The model is applied to data on intra-generational mobility to analyse the interdependent transitions of men and women through the labour market, as well as to understanding the extent to which clustering in mobility can be modelled by exogenously defined social classes or through endogenous structures.

Tuesday, 01/30/2018, 12:30PM – 1:45PMSpecial Seminar: What Use is ‘Time-Expired’ Disparity and Optic Flow Information to a Moving Observer?

Location: Royce 160
Andrew Glennerster, Professor of Visual Neuroscience at University of Reading

It is clear that optic flow is useful to guide an observer’s movement and that binocular disparity contributes too (e.g. Roy, Komatsu and Wurtz, 1992). Both cues are important in recovering scene structure. What is less clear is how the information might be useful after a few seconds, when the observer has moved to a new vantage point and the egocentric frame in which the information was gathered is no longer applicable. One answer, pursued successfully in computer vision, is to interpret any new binocular disparity and optic flow information in relation to a 3D reconstruction of the scene (Simultaneous Localisation and Mapping, SLAM). Then, as the estimate of the camera pose is updated, the 3D information computed from earlier frames is always relevant. No-one suggests that animals carry out visual SLAM, at least not in the way that computer vision implements it and yet we have no serious competitor models. Reinforcement learning is just beginning to approach 3D tasks such as navigation and to build representations that are quite unlike a 3D reconstruction. I will describe psychophysical tasks from our VR lab where participants point to unseen targets after navigating to different locations. There are large systematic biases in performance on these tasks that rule out (in line with other evidence) the notion that humans build a stable 3D reconstruction of the scene that is independent of the task at hand. I will discuss some indications about what it might do instead.

Andrew Glennerster studied medicine at Cambridge before working briefly with Michael Morgan at UCL then doing a DPhil and an EU-funded postdoc with Brian Rogers on binocular stereopsis (1989 – 1994). He held an MRC Career Development Award (1994 – 1998) with Andrew Parker in Physiology at Oxford including a year with Suzanne McKee in Smith-Kettlewel, San Francisco. He continued work with Andrew Parker on a Royal Society University Research Fellowship (1999 – 2007) which allowed him to set up a virtual reality laboratory to study 3D perception in moving observer, funded for 12 years by the Wellcome Trust. He moved to Psychology in Reading in 2005, first as a Reader and now as a Professor, where the lab is now funded by EPSRC.

Tuesday, 01/23/2018, 2:00PM – 3:30PMStatistics: A Window to Understanding Diversity

Location: Franz Hall 2258A
Mahtash Esfandiari, Senior Continuing Lecturer, UCLA Statistics

Students who enter UCLA College of Letters and Science since the Fall Quarter of 2015 are required to take a diversity course as a graduation requirement. Currently more than 200 courses qualify, but, only three are offered in the Physical Sciences Division; one of which is entitled “Statistics: A Window to Understanding Diversity”. In this talk, I will discuss: 1) The goals and the theoretical underpinnings of “Statistics: A Window to Understanding Diversity”, 2) explain how this experience-based course allows the students from STEM (south campus) and non-STEM majors (north campus) learn and critically analyze the theory of diversity, 3) learn how to use statistical methods taught in the course to analyze extensive data collected on UCLA students’ perception of our campus climate, 4) discuss quantitative and qualitative findings that demonstrate how this course has enabled our students to critically think about diversity issues, develop a better understanding of diversity climate at UCLA, and adjust more successfully to our campus, and the diverse world in which we live. Additionally, I will elaborate how collaboration between Statistics Department with the Office of Vice Chancellor of Diversity , “Equity, Diversity, and Inclusion”, and the support from Vice Chancellor of Diversity, the Dean of Physical Science, and other scholars at UCLA has helped us in promoting the diversity initiative on our campus.

Tuesday, 01/16/2018, 2:00PM – 3:30PMClassification Errors and the Neyman-Pearson Classification

Franz Hall 2258A
Jingyi Jessica Li, Assistant Professor, UCLA Statistics

In this talk, I will first introduce the theoretic framework of binary classification and the concepts of population and empirical versions of the risk, the type I error and the type II error of a binary classifier, with a focus on their different sources of randomness. Second, I will introduce the Neyman-Pearson (NP) classification paradigm, which targets on minimizing the population type II error while enforcing an upper bound on the population type I error. Under the NP paradigm, I will further introduce our recent work including (1) an umbrella algorithm for implementing the NP classification with popular binary classification methods, such as Logistic Regression, Support Vector Machines, and Random Forests, (2) a graphical tool NP-ROC bands for visualizing NP classification results, and (3) a feature ranking method for selecting marginally important features and jointly important feature sets in a high-dimensional setting.

Tuesday, 12/12/2017, 2:00PM – 3:30PMDynamic Modeling for Health in the Age of Big Data

4240 Public Affairs Building
Prof. Nathaniel Osgood, Professor, Department of Computer Science, Associate Faculty at Department of Community Health & Epidemiology and Bioengineering Division
University of Saskatchewan

Traditional approaches to public health concerns have conferred great advances in the duration and quality of life. Public health interventions – from improved sanitation efforts, to vaccination campaigns, to contact tracing and environmental regulations – have helped reduce common risks to health throughout many areas of the world. Unfortunately, while traditional methods from the health sciences have proven admirably suited for addressing traditional challenges, a troubling crop of complex health challenges confront the nation and the world, and threaten to stop – and even reverse the – rise in length and quality of life that many have taken for granted. Examples include multi-factorial problems such as obesity and obesity-related chronic disease, the spread of drug-resistant and rapidly mutating pathogens that evade control efforts, and “syndemics” of mutually reinforcing health conditions (such as Diabetes and TB; substance abuse, violence and HIV/AIDS; obesity & stress). Such challenges have proven troublingly policy resistant, with interventions being thwarted by “blowback” from the complex feedbacks involved, and attendant costs threaten to overwhelm health care systems. In the face of such challenges public health decision makers are increasingly supplementing their toolbox using “system science” techniques. Such methods – also widely known as “complex systems approaches” – provide a way to understand a system’s behavior as a whole and as more than the sum of its parts, and a means of anticipating and managing the behavior of a system in more judicious and proactive fashion. However, such approaches offer substantially greater insight and power when combined with rich data sources. Within this talk, we will highlight the great promise afforded by combining of Systems Science techniques and rich data sources, particularly emphasizing the role of cross-linking models with “big data” offering high volume, velocity, variety and veracity. Examples of such data include fine-grained temporal and spatial information collected by smartphone-based and wearable as well as building and municipal sensors, data from social media posts and search behaviour, helpline calls, website accesses and rich cross-linked databases. Decision-oriented models grounded by such novel data sources can allow for articulated theory building regarding difficult-to-observe aspects of human behavior. Such models can also aid in informing evaluation of and judicious selection between sophisticated interventions to lessen the health burden of a wide variety of health conditions. Such models are particularly powerful when complemented by machine learning and computational statistics techniques that permit recurrent model regrounding in the newest evidence, and which allow a model to knit together holistic portrait of the system as a whole, and which support grounded investigation of between intervention strategies tradeoffs.

Tuesday, 11/28/2017, 2:00PMRegression with a Surfeit of Known and Unknown Confounders

1434A Physics and Astronomy
Ed Leamer, Chauncey J., Professor in Economics and Statisics
UCLA Anderson School of Management

Professor Leamer will talk to us about his recent work on rethinking regression to consider uncertainty bias.

Tuesday, 11/14/2017, 2:00PM – 3:30PMDark Matter and Machine Learning

1434A Physics and Astronomy
Tommaso Treu, Professor
UCLA Department of Physics and Astronomy

Dark needles in a dark haystack

In the standard cosmological model ninety-five percent of the energy content of the universe consists of dark energy and dark matter. Even though their abundance seems well determined, very little is known about their fundamental nature. I will describe how we can learn about the physics of the dark sector by studying in detail its gravitational effect on the trajectories of photons as they travel across the universe, a rare phenomenon known as strong gravitational lensing. I will show how one can use strong gravitational lenses with a time variable background source to measure the expansion rate of the universe (Hubble constant) to 3.8% precision. This result is completely independent of the local distance ladder and the cosmic microwave background, and thus provides a new opportunity to understand whether the tension between the two arises from systematic uncertainties or may be indicative of new physics. Further progress in the field requires finding new strong gravitational lens systems. This is challenging because they are extremely rare on the sky (1/1000,000 sources) and typically only partially resolved at ground based imaging resolution. In the past few years, machine learning techniques have started to be applied with the problem  of finding lenses with promising results. However, much work remains to be done on developing new methods and applying them to current and future large datasets.

Wednesday, 11/08/2017, 3:30PM – 4:50PMProgramming Data Science with R & the Tidyverse

1200 Rolfe Hall
Hadley Wickham

We demonstrate the scalability and accuracy of our method. On a standard hardware, our method computes the top five PCs in less than an hour on data from 500,000 individuals and 100,000 genetic variants. Finally, the probabilistic formulation allows this model to be generalized in several directions.

Tuesday, 11/07/2017, 2:00PMScalable Probabilistic PCA for Large-Scale Genetic Variation Data

1434A Physics and Astronomy
Sriram Sankararaman
UCLA Department of Computer Science

Principal Component Analysis (PCA) is a key tool in many genomic analyses. With the advent of large-scale datasets of genetic variation, there is a need for methods that can compute principal components with scalable computational and memory requirements. Exact methods for computing principal components do not scale to large datasets. For many genomic applications, it is sufficient to be able to compute a small number of the top principal components. We propose a scalable and exact algorithm to compute the top principal components on genetic variation data. Our method is based on a latent variable model of which PCA arises in the small variance limit. The latent variable formulation leads to an scalable iterative EM algorithm for computing the principal components. Leveraging the structure of genetic variation data allows us to obtain sub-linear time algorithm for computing principal components.

We demonstrate the scalability and accuracy of our method. On a standard hardware, our method computes the top five PCs in less than an hour on data from 500,000 individuals and 100,000 genetic variants. Finally, the probabilistic formulation allows this model to be generalized in several directions.

Tuesday, 10/24/2017, 2:00PMStatistical Significance and Discussion of the Challenges of Avoiding the Abuse of Statistical Methodology

1434A Physics and Astronomy
Sander Greenland, Professor Emeritus
UCLA Department of Epidemiology

Sander Greenland will offer his perspective on the paper, “Redefine Statistical Significance”, which was the topic of the previous week’s seminar. Also he will discuss the challenges of avoiding the abuse of statistical methodology.

Tuesday, 10/17/2017, 2:00PMRedefine Statistical Significance

1434A Physics and Astronomy
Daniel Benjamin, Associate Professor
USC Dornsife Center for Economic and Social Research

Daniel Benjamin will discuss his paper (written by him and 71 other authors), “Redefine Statistical Significance”. The paper proposes that the default p-value threshold should be changed from 0.05 to 0.005.

The paper is available at this link.

Tuesday, 6/06/2017, 2:00PM – 3:00PMScaling and Generalizing Bayesian Inference

Kinsey Pavilion 1240B
David Blei, Professor
Department of Statistics and Computer Science, Columbia University


Bayesian statistics and expressive probabilistic modeling have become key tools for the modern statistician. They let us express complex assumptions about the hidden structures that underlie our data and have been successfully applied in numerous fields.

The central computational problem in Bayesian statistics is posterior inference, the problem of approximating the conditional distribution of the hidden variables given the observations. Approximate posterior inference algorithms have revolutionized the field, revealing its potential as a usable and general-purpose language for data analysis.

Bayesian statistics, however, has not yet reached this potential. First, statisticians and scientists regularly encounter massive data sets, but existing approximate inference algorithms do not easily scale. Second, most approximate inference algorithms are not generic; each must be adapted to the specific model at hand.

In this talk I will discuss our recent research on addressing these two limitations. I will first describe stochastic variational inference, an approximate inference algorithm for handling massive data sets and I will demonstrate its application to probabilistic topic models of millions of articles. Then I will discuss black box variational inference, a generic algorithm for approximating the posterior. Black box inference easily applies to many models with little model-specific derivation and few restrictions on their properties. I will demonstrate its use on deep exponential families and describe how it enables powerful tools for probabilistic programming.

Tuesday, 5/30/2017, 2:00PM – 3:00PMPrediction under Asymmetry: Empirical Bayes predictive methods for Check loss

Kinsey Pavilion 1240B
Gourab Mukherjee, Assistant Professor
Data Sciences and Operations, University of Southern California

A host of modern applications require prediction under asymmetric loss functions. The check loss is piecewise linear and is widely used in such applications for penalizing underestimation and overestimation in different ways. Here, we develop new empirical Bayes methods that can produce optimal prediction under agglomerative check losses in high dimensional hierarchical models. Because of the nature of this loss, our inferential target is a pre-chosen quantile of the predictive distribution rather than the mean of the predictive distribution. In common with many other problems we find that empirical Bayes shrinkage provides better performance than simple coordinate-wise rules. However, the problem here differs in fundamental respects from estimation or prediction under the quadratic losses considered in most of the previous literature. This necessitates different strategies for creation of effective empirical Bayes predictors. We develop new methods for constructing uniformly efficient asymptotic risk estimates for conditionally linear predictors. Minimizing these risk estimates we obtain an empirical Bayes prediction rule which has asymptotic optimality properties not shared by EB strategies that use maximum likelihood or method-of-moments to estimate the hyper-parameters.

Tuesday, 5/25/2017, 2:00PM – 3:00PMPower Enhancement Tests for High Dimensional Data with Applications to Genetic and Genomic Studies

Kinsey Pavilion 1240B
Lingzhou Xue, Assistant Professor
Department of Statistics, Pennsylvania State University

In this talk, I will discuss some new insights in hypothesis tests for analysis of high-dimensional data, which are motivated by genetic and genomic studies. In the current literature, two sets of test statistics are commonly used for various high-dimensional tests: 1) using extreme-value form statistics to test against sparse alternatives, and 2) using quadratic form statistics to test against dense alternatives. However, quadratic form statistics suffer from low power against sparse alternatives, and extreme-value form statistics suffer from low power against dense alternatives with small disturbances and may have size distortions due to its slow convergence. For real-world applications, it is important to derive powerful testing procedures against more general alternatives. Based on their joint limiting laws, we introduce new power enhancement testing procedures to boost the power against more general alternatives and retain the correct asymptotic size. Under the high-dimensional setting, we derive the closed-form limiting null distributions, and obtain their explicit rates of uniform convergence. We demonstrate the performance of our proposed test statistics in numerical studies.

Wednesday, 5/24/2017, 12:00pm – 1:30pmPredicting the Evolution of Intrastate Conflict: Evidence from Nigeria

Presented by the Center for Social Statistics.

4240 Public Affairs Building
Shahryar Minhas
Postdoctoral Fellow, Duke University
Assistant Professor, Michigan State University
Department of Political Science and the Social Science Data Analytics Program (SSDA)

The endogenous nature of civil conflict has limited scholars’ abilities to draw clear inferences about the drivers of conflict evolution. We argue that three primary features characterize the complexity of intrastate conflict: (1) the interdependent relationships of conflict between actors; (2) the impact of armed groups on violence as they enter or exit the conflict network; and (3) the ability of civilians to influence the strategic interactions of armed groups. Using ACLED event data on Nigeria, we apply a novel network-based approach to predict the evolution of intrastate conflict dynamics. Our network approach yields insights about the effects of civilian victimization and key actors entering the conflict. Attacks against civilians lead groups to both be more violent, and to become the targets of attacks in subsequent periods. Boko Haram’s entrance into the civil war leads to an increase in violence even in unrelated dyads. Further, our approach significantly outperforms more traditional dyad-group approaches at predicting the incidence of conflict.

Tuesday, 5/23/2017, 2:00PM – 3:00PMPanning for gold: Model-free knockoffs for high-dimensional controlled variable selection

Kinsey Pavilion 1240B
Jinchi Lv, Associate Professor
Data Sciences and Operations, University of Southern California

Many contemporary large-scale applications involve building interpretable models linking a large set of potential covariates to a response in a nonlinear fashion, such as when the response is binary. Although this modeling problem has been extensively studied, it remains unclear how to effectively control the fraction of false discoveries even in high-dimensional logistic regression, not to mention general high-dimensional nonlinear models. To address such a practical problem, we propose a new framework of model-free knockoffs, which reads from a different perspective the knockoff procedure (Barber and Candès, 2015) originally designed for controlling the false discovery rate in linear models. The key innovation of our method is to construct knockoff variables probabilistically instead of geometrically. This enables model-free knockoffs to deal with arbitrary (and unknown) conditional models and any dimensions, including when the dimensionality p exceeds the sample size n, while the original knockoffs procedure is constrained to homoscedastic linear models with n ≥ p. Our approach requires the design matrix be random (independent and identically distributed rows) with a covariate distribution that is known, although we show our procedure to be robust to unknown/estimated distributions. To our knowledge, no other procedure solves the \textit{controlled} variable selection problem in such generality, but in the restricted settings where competitors exist, we demonstrate the superior power of knockoffs through simulations. Finally, we apply our procedure to data from a case-control study of Crohn’s disease in the United Kingdom, making twice as many discoveries as the original analysis of the same data. This is a joint work with Emmanuel Candès, Yingying Fan and Lucas Janson.

Tuesday, 5/16/2017, 2:00PM – 3:00PMTopic: A New SOP for Accurate and Efficient Community Detection

Kinsey Pavilion 1240B
Frederick K. H. Phoa
Institute of Statistical Science, Academia Sinica, Taipei 115, Taiwan

Community is one of the most important features in social networks. There were many traditional methods in the literature to detect communities in network science and sociological studies, but few were able to identify the statistical significance of the detected communities. Even worse, these methods were computationally infeasible for networks with large numbers of nodes and edges. In this talk, we introduce a new SOP for detecting communities in a social network accurately and efficiently. It consists of four main steps. First, a screening stage is proposed to roughly divide the whole network into communities via complement graph coloring. Then a likelihood-based statistical test is introduced to test for the significance of the detected communities. Once these significant communities are detected, another likelihood-based statistical test is introduced to check for the focus centrality of each community. Finally, a metaheuristic swarm intelligence based (SIB) method is proposed to fine tune the range of each community from its original circular setting. Some famous networks are used as empirical data to demonstrate the process of this new SOP.

Tuesday, 05/09/2017, 2:00PM – 3:00pmTopic: Neyman-Pearson (NP) classification algorithms and NP receiver operating characteristic (NP-ROC)

Kinsey Pavilion 1240B
Xin Tong, Assistant Professor
Department of Data Sciences and Operations, Marshall School of Business, University of Southern California

In many binary classification applications, such as disease diagnosis and spam detection, practitioners commonly face the need to limit type I error (i.e., the conditional probability of misclassifying a ‘normal’, or class $0$, observation as ‘abnormal’, or class $1$) so that it remains below a desired threshold. To address this need, the Neyman-Pearson (NP) classification paradigm is a natural choice; it minimizes type II error (i.e., the conditional probability of misclassifying a class $1$ observation as class $0$) while enforcing an upper bound, $\alpha$, on the type I error. Although the NP paradigm has a century-long history in hypothesis testing, it has not been well recognized and implemented in statistical classification schemes. Common practices that directly limit the empirical type I error to no more than $\alpha$ do not satisfy the type I error control objective because the resulting classifiers are still likely to have type I errors much larger than $\alpha$. As a result, the NP paradigm has not been properly implemented for many classification scenarios in practice. In this work, we develop the first umbrella algorithm that implements the NP paradigm for all scoring-type classification methods, including popular methods such as logistic regression, support vector machines and random forests. Powered by this umbrella algorithm, we propose a novel graphical tool for NP classification methods: NP receiver operating characteristic (NP-ROC) bands, motivated by the popular receiver operating characteristic (ROC) curves. NP-ROC bands will help choose $\alpha$ in a data adaptive way, compare different NP classifiers, and detect possible overfitting. We demonstrate the use and properties of the NP umbrella algorithm and NP-ROC bands, available in the \verb+R+ package \verb+nproc+, through simulation and real data case studies.

Thursday, 5/04/2017, 2:00PM – 3:30pmde Leeuw Seminar: Festschrift Reloaded – The Lion Strikes Back

Location: 314 Royce Hall

Patrick Mair, Senior Lecturer in Statistics
Department of Psychology, Harvard

Katharine Mullen, Adjunct Assistant Professor
UCLA Department of Statistics

In September 2016 the Journal of Statistical Software (JSS) published a Festschrift for Jan de Leeuw, founding chair of the Department of Statistics at UCLA and founding editor of JSS. The Festschrift commemorated Jan’s retirement as well as the 20-year anniversary of JSS.  Six contributions surveyed Jan’s methodological contributions on topics such as multiway analysis, Gifi, multidimensional scaling, and other, somewhat more exotic scaling approaches. One contribution traced the development of R and other statistical software in the pages of JSS. The final paper by Don Ylvisaker looked back at the early days of the Department of Statistics at UCLA.  In this talk, the editors of the Festschrift reflect on some of the highlights presented in these contributions, discuss Jan’s role in these developments, and outline some newer research topics Jan has been working on over the last few months.

More information is available here.

Tuesday, 05/02/2017, 2:00PM – 3:00pmTopic: Robust inference in high-dimensional models – going beyond sparsity principles

Kinsey Pavilion 1240B
Jelena Bradic, Assistant Professor of Statistics
Department of Mathematics, University of California, San Diego

In high-dimensional linear models the sparsity assumption is typically made, stating that most of the parameters are equal to zero. Under the sparsity assumption, estimation and, recently, inference have been well studied. However, in practice, sparsity assumption is not checkable and more importantly is often violated, with a large number of covariates expected to be associated with the response, indicating that possibly all, rather than just a few, parameters are non-zero. A natural example is a genome-wide gene expression profiling, where all genes are believed to affect a common disease marker. We show that existing inferential methods are sensitive to the sparsity assumption, and may, in turn, result in the severe lack of control of Type-I error. In this article, we propose a new inferential method, named CorrT, which is robust and adaptive to the sparsity assumption. CorrT is shown to have Type I error approaching the nominal level and Type II error approaching zero, regardless of how sparse or dense the model. In fact, CorrT is also shown to be optimal whenever sparsity holds. Numerical and real data experiments show a favorable performance of the CorrT test compared to the state-of-the-art methods.

Tuesday, 4/25/2017, 2:00PM – 3:00pmTopic: Asynchronous Parallel Algorithms for Fixed-Point Problems and Large-Scale Optimization

Kinsey Pavilion 1240B
Wotao Yin, Professor
Department of Mathematics, UCLA

wotaoYINMany problems reduce to the fixed-point problem of solving x=T(x). This talk discusses a coordinate-friendly structure in the operator T that prevails in many optimization problems and enables highly efficient asynchronous algorithms. By “asynchronous”, we mean that the algorithm runs on multiple threads and each thread computes with possibly delayed information from the other threads. The threads do not coordinate their iterations. On modern computer architecture, asynchronous algorithms can be order-of-magnitude faster that the standard (synchronous) parallel algorithms! We demonstrate how to solve large-scale applications in machine learning, image processing, portfolio optimization, and second-order cone programs with asynchronous algorithms.

We also present a fundamental theoretical result: as long as T has a fixed point and is nonexpansive, the asynchronous coordinate-update algorithm converges to a fixed point under either a bounded delay or certain kinds of unbounded delays. The operator does not have to be contractive.

This is joint work with many current and former students.

Tuesday, 4/11/2017, 2:00PM – 3:00pmTopic: Scalable Bayesian Models of Interacting Time Series

Boelter Hall 5436
Emily Fox, Amazon Professor of Machine Learning
Department of Statistics, University of Washington

Data streams of increasing complexity and scale are being collected in a variety of fields ranging from neuroscience and genomics to e-commerce. Modeling the intricate relationships between the large collection of series can lead to increased predictive performance and domain-interpretable structures. For scalability, it is crucial to discover and exploit sparse dependencies between the data streams. Such representational structures for independent data sources have been studied extensively, but have received limited attention in the context of time series. In this talk, we present a series of models for capturing such sparse dependencies via clustering, graphical models, and low-dimensional embeddings of time series. We explore these methods in a variety of applications, including house price modeling and inferring networks in the brain. We likewise discuss methods for scaling Bayesian inference to large datasets.

We then turn to observed interaction data, and briefly touch upon how to devise statistical network models that capture important network features like sparsity of edge connectivity. Within our Bayesian framework, a key insight is to move to a continuous-space representation of the graph, rather than the typical discrete adjacency matrix structure. We demonstrate our methods on a series of real-world networks with up to hundreds of thousands of nodes and millions of edges.


Emily Fox is currently the Amazon Professor of Machine Learning in the Statistics Department at the University of Washington. She received a S.B. in 2004 and Ph.D. in 2009 from the Department of Electrical Engineering and Computer Science at MIT. She has been awarded a Presidential Early Career Award for Scientists and Engineers (PECASE, 2017), Sloan Research Fellowship (2015), an ONR Young Investigator award (2015), an NSF CAREER award (2014), the Leonard J. Savage Thesis Award in Applied Methodology (2009), and the MIT EECS Jin-Au Kong Outstanding Doctoral Thesis Prize (2009). Her research interests are in large-scale Bayesian dynamic modeling and computations.

Wednesday, 3/15/2017, 1:30PMTopic: Metareasoning and Mental SimulationUCLA Statistics/Communication Studies Seminar

4242 Young Hall (JIFRESSE Seminar Room)
Jessica Hamrick, Ph.D. Candidate
Department of Psychology, University of California, Berkeley

At any given moment, how should an intelligent agent decide what to think about, how to think about it, and how long to think for? My research attempts to answer these questions by focusing on the best examples of intelligent agents that we have: humans. In particular, I study how people use their “mental simulations”, which can be thought of as samples from a rich generative model of the world. I show how people adaptively use their mental simulations to learn new things about the world; that they choose which simulations to run based on which they think will be more informative; and that they allocate their cognitive resources to spend less time on easy problems and more time on hard problems. Based on these results, I will illustrate how statistics, machine learning, and cognitive science can complement one another. On the one hand, I will show how ideas from cognitive science can inform and inspire new approaches in machine learning; on the other hand, I will discuss how the mathematical models of human cognition that I develop in my research can be used to build agents that are better able to reason about and communicate with human collaborators.

Tuesday, 3/14/2017, 2:00 PM—3:00 PMTopic: Human Behavior in Techno-Social Systems

Physics and Astronomy Building 1434A
Emilio Ferrara, Research Assistant Professor
Department of Mathematics

The increasing availability of data across different socio-technical systems, such as online social networks, social media, and mobile phone networks, presents novel challenges and intriguing research opportunities. As more online services permeate through our everyday life and as data from various domains are connected and integrated with each other, the boundary between the real and the online worlds becomes blurry. Such data convey both online and offline activities of people, as well as multiple time scales and resolutions. In this talk, I’ll discuss my research efforts aimed at characterizing and predicting human behavior and activities in techno-social worlds: starting by discussing network structure and information spreading on large online social networks, I’ll move toward characterizing entire online conversations, such as those around big real-world events, to capture the dynamics driving the emergence of collective attention. I’ll describe a machine learning framework leveraging these insights to detect promoted campaigns that mimic grassroots conversation. Aiming at learning the signature of abuse at the level of the single individuals, I’ll illustrate the challenges posed by characterizing human activity as opposed to that of synthetic entities (social bots) that attempt emulate us, to persuade, smear, tamper or deceive. I’ll then explore a variety of such applications and problems to study of emotions, cognitive heuristics, etc.

Bio. Dr. Emilio Ferrara is Research Assistant Professor at the University of Southern California, Research Leader at the USC Information Sciences Institute, and Principal Investigator at the USC/ISI Machine Intelligence and Data Science (MINDS) research group. Ferrara’s research interests include modeling and predicting individual behavior in techno-social systems, characterize information diffusion, and predict crime and abuse in such environments. He has held various research positions in institutions in Italy, Austria, and UK (2009-2012). Before joining USC in 2015, he was research faculty in the School of Informatics and Computing of Indiana University (2012-2015). Ferrara holds a Ph.D. in Computer Science from University of Messina (Italy), and has published over 80 articles on social networks, machine learning, and network science, appeared in venues like Proceeding of the National Academy of Sciences, Communications of the ACM, Physical Review Letters, and several ACM and IEEE transactions and conferences. His research on social networks has been featured on the major news outlets (TIME, BBC, The New York Times, etc.) and tech magazines (MIT Technology Review, Vice, Mashable, New Scientist, etc). He was named IBM Watson Big Data Influencer in 2015, and Maptive’s Big Data Expert to Follow in 2016. He received the 2016 DARPA Young Faculty Award, and the 2016 Complex Systems Society Junior Scientific Award “For outstanding contributions to complex systems sciences.” His research is supported by DARPA, IARPA, Air Force, and Office of Naval Research.

Friday, 3/10/2017, 1:30PMTopic: Visual Roots of Social CognitionUCLA Statistics/Communication Studies Seminar

4242 Young Hall (JIFRESSE Seminar Room)
Tao Gao, Research Scientist
Massachusetts Institute of Technology, Computer Vision Lab, GE Research

Intelligent human communication relies on our remarkable abilities of understanding others’ mental states, including beliefs, desires, goals and intentions. As a case study of “Theory of Mind” (ToM), I have been exploring the perceptual roots of social cognition — focusing on “reverse-engineering” human abilities to recognize animacy, agency, and intentionality in visual scenes. My work in this area touches on themes from cognitive science, computer vision, social psychology, and neuroscience. In my talk, I will describe several different kinds of studies that contribute to this overall research project, including explorations of (1) just what kinds of cues trigger this form of perception; (2) how the perception of these social properties influences subsequent cognition and action; and (3) how to build models of social perception through computer vision and machine learning. Collectively, these projects show how the perception of animacy and intentionality is deeply wired into our minds, how it can be modeled for higher-level social communication and collaboration.

Wednesday, 3/08/2017, 1:30PMTopic: Towards Natural, Intelligent, and Collaborative Human-Agent CommunicationUCLA Statistics/Communication Studies Seminar

4242 Young Hall (JIFRESSE Seminar Room)
Joyce Chai, Professor
Department of Computer Science and Engineering, Michigan State University

Enabling situated communication with artificial agents (e.g., robots) faces many challenges. Humans and artificial agents have different levels of linguistic, task, and world knowledge. In situated communication with robots, although humans and robots are co-present in a shared physical environment, they have mismatched capabilities in perceiving and interpreting the shared environment and joint tasks. All of these significantly jeopardize the common ground between partners, making language-based communication extremely difficult. To address this problem, we have developed several computational collaborative models for language processing which are motivated by cooperative principles in human communication. We have also developed approaches to allow artificial agents to continuously acquire knowledge through communicating with humans to establish common ground.  In this talk, I will give an introduction to this research effort and particularly focus on referential communication and interactive verb acquisition and action learning.

Tuesday, 3/7/2017, 2:00 PM—3:00 PMTopic: Exploration of Large Networks Via Latent Space Modeling

Physics and Astronomy Building 1434A
Zongming Ma
Department of Statistics
University of Pennsylvania

Latent space models are effective tools for statistical network data analysis. The present paper presents two fitting algorithms for a broad class of latent space models that can effectively model network characteristics such as degree heterogeneity, transitivity, homophily, etc. The methods are motivated by inner-product models, but are readily applicable to more general models that allow latent vectors to affect edge formation in flexible ways. Both methods are scalable to very large networks and have fast rates of convergence. The effectiveness of the modeling approach and fitting methods is demonstrated on a number of simulated and real world network datasets.

Friday, 3/3/2017, 11:00AMTopic: Geometry of Neural NetworksUCLA Statistics/Mathematics Seminar

Faculty Center – Cypress Room
Guido Montufar, Postdoctoral Scientist
Max Planck Institute for Mathematics in the Sciences & Leipzig University

Deep Learning is one of the most successful machine learning approaches to artificial intelligence. In this talk I discuss the geometry of neural networks as a way to study the success of Deep Learning at a mathematical level and to develop a theoretical basis for making further advances, especially in situations with limited amounts of labeled data and challenging problems in reinforcement learning. I present some recent results on the representational power of neural networks. Then I demonstrate how the capacity of a neural network can be aligned with the structure of perception-action problems in order to obtain more efficient learning systems.

Thursday, 3/2/2017, 1:30 PMTopic: Computational and Statistical Convergence for Graph Estimation: Just RelaxUCLA Statistics/Mathematics Seminar

Faculty Center – Cypress Room
Shuheng Zhou, Assistant Professor
Department of Statistics
University of Michigan

The general theme of my research in recent years is spatio-temporal modeling and sparse recovery with high dimensional data under measurement error. In this talk, I will discuss several computational and statistical convergence results on graph and sparse vector recovery problems. Our analysis reveals interesting connections between computational and statistical efficiency and the concentration of measure phenomenon in random matrix theory. Our methods are applicable to many application domains such as neuroscience, geoscience and spatio-temporal modeling, genomics, and network data analysis. I will present theory, simulation and data examples. Part of this talk is based on joint work with Mark Rudelson.

Tuesday, 2/28/2017, 2:00 PM—3:00 PMTopic: Convexified Modularity Maximization for Degree-Corrected Stochastic Block Models

Physics and Astronomy Building 1434A
Xiaodong Li
Department of Statistics
UC Davis

The stochastic block model (SBM) is a popular framework for studying community detection in networks. This model is limited by the assumption that all nodes in the same community are statistically equivalent and have equal expected degrees. The degree-corrected stochastic block model (DCSBM) is a natural extension of SBM that allows for degree heterogeneity within communities. This paper proposes a convexified modularity maximization approach for estimating the hidden communities under DCSBM. This approach is based on a convex programming relaxation of the classical (generalized) modularity maximization formulation, followed by a novel doubly-weighted `1-norm k-median procedure. In view of a novel degree-corrected density gap condition, we establish non-asymptotic theoretical guarantees for both approximate clustering and perfect clustering. In the special case of SBM, these theoretical results match the best- known performance guarantees of computationally feasible algorithms. Numerically, we provide an efficient implementation of our algorithm, which is applied to both synthetic and real-world networks. Experiment results show that our method enjoys competitive empirical performance compared to the state-of-the-art tractable methods in the literature. This is a joint work with Yudong Chen and Jiaming Xu.

Friday, 2/24/2017, 11:00 AMCombinatorial InferenceUCLA Statistics/Mathematics Seminar

Faculty Center, Cypress Room
Han Liu
Assistant Professor
Princeton University, Department of Operations Research and Financial Engineering

We propose a new family of combinatorial inference problems for graphical models. Unlike classical statistical inference where the main interest is point estimation or parameter testing of Euclidean parameters, combinatorial inference aims at testing the global structure of the underlying graph. Examples include testing the graph connectivity, the presence of a cycle of certain size, or the maximum degree of the graph. To begin with, we develop a unified theory for the fundamental limits of a large family of combinatorial inference problems. We propose new structural packing entropies to characterize how the complexity of combinatorial graph structures impacts the corresponding minimax lower bounds. On the other hand, we propose a family of practical structural testing algorithms to match the obtained lower bounds. We use a case study of brain network analysis to illustrate the usefulness of these proposed methods.

Thursday, 2/23/2017, 1:30 PMTopic: ePCA: Exponential family PCAUCLA Statistics/Mathematics Seminar

Faculty Center, Cypress Room
Edgar Dobriban
PhD Candidate
Stanford University, Department of Statistics

Many applications, such as photon-limited imaging and genomics, involve large datasets with entries from exponential family distributions. It is of interest to estimate the covariance structure and principal components of the noiseless distribution. Principal Component Analysis (PCA), the standard method for this setting, can be inefficient for non-Gaussian noise. In this talk we present ePCA, a methodology for PCA on exponential family distributions. ePCA involves the eigendecomposition of a new covariance matrix estimator, constructed in a deterministic non-iterative way using moment calculations, shrinkage, and random matrix theory. We provide several theoretical justifications for our estimator, including the Marchenko-Pastur law in high dimensions. We illustrate ePCA by denoising molecular diffraction maps obtained using photon-limited X-ray free electron laser (XFEL) imaging. This is joint work with Lydia T. Liu and Amit Singer.

Tuesday, 2/21/2017, 2:00 PM—3:00 PMA New Theory of Exploration in Reinforcement Learning with Function Approximation

Physics and Astronomy Building 1434A
Alekh Agarwal
Microsoft Research

This talk considers a core question in reinforcement learning (RL): How can we tractably solve sequential decision making problems where the learning agent receives rich observations?

We begin with a new model called Contextual Decision Processes (CDPs) for studying such problems, and show that it encompasses several prior setups to study RL such as MDPs and POMDPs. Several special cases of CDPs are, however, known to be provably intractable in their sample complexities. To overcome this challenge, we further propose a structural property of such processes, called the Bellman Rank. We find that the Bellman Rank of a CDP (and an associated class of functions) provides an intuitive measure of the hardness of a problem in terms of sample complexity—that is the number of samples needed by an agent to discover a near optimal policy for the CDP. In particular, we propose an algorithm, whose sample complexity scales with the Bellman Rank of the process, and is completely independent of the size of the observation space of the agent unlike most prior results. We also show that our techniques are robust to our modeling assumptions, and make connections to several known results as well as highlight novel consequences of our results.

This talk is based on joint work with Nan Jiang, Akshay Krishnamurthy, John Langford and Rob Schapire.

Tuesday, 2/14/2017, 2:00 PM—3:00 PMRandom Matrices with Heavy-Tailed Entries: Tight Mean Estimators and Applications

Physics and Astronomy Building 1434A
Stanislav Minsker
Department of Mathematics

Estimation of the covariance matrix has attracted significant attention of the statistical research community over the years, partially due to important applications such as Principal Component Analysis. However, frequently used empirical covariance estimator (and its modifications) is very sensitive to outliers in the data. As P. Huber wrote in 1964, “…This raises a question which could have been asked already by Gauss, but which was, as far as I know, only raised a few years ago (notably by Tukey): what happens if the true distribution deviates slightly from the assumed normal one? As is now well known, the sample mean then may have a catastrophically bad performance…” Motivated by this question, we develop a new estimator of the (element-wise) mean of a random matrix, which includes covariance estimation problem as a special case. Assuming that the entries of a matrix possess only finite second moment, this new estimator admits sub-Gaussian or sub-exponential concentration around the unknown mean in the operator norm. We will present extensions of our approach to matrix-valued U-statistics, as well as applications to covariance estimation and matrix completion problems.

Part of the talk will be based on a joint work with Xiaohan Wei.

Tuesday, 2/7/2017, 2:00 PM—3:00 PMMatrix Completion, Saddlepoints, and Gradient Descent

Physics and Astronomy Building 1434A
Jason Lee
Data Sciences and Operations Department

Matrix completion is a fundamental machine learning problem with wide applications in collaborative filtering and recommender systems. Typically, matrix completion are solved by non-convex optimization procedures, which are empirically extremely successful. We prove that the symmetric matrix completion problem has no spurious local minima, meaning all local minima are also global. Thus the matrix completion objective has only saddlepoints an global minima.

Next, we show that saddlepoints are easy to avoid for even Gradient Descent — arguably the simplest optimization procedure. We prove that with probability 1, randomly initialized Gradient Descent converges to a local minimizer. The same result holds for a large class of optimization algorithms including proximal point, mirror descent, and coordinate descent.

Tuesday, 1/31/2017, 2:00 PM—3:00 PMLatent Variable Models for Inferring Structure in Genomic Data

Physics and Astronomy Building 1434A
Sriram Sankararaman
Department of Computer Science

Genomic analyses have revealed that admixture, a process in which populations mix and exchange genes, has been a common process through human evolution. Understanding the genomic ancestry of an admixed individual is an important step in genomic medicine, social science and evolutionary biology. To this end, we need to develop statistical models that are detailed enough to capture the key biological signals while permitting efficient inference across large-scale genomic datasets.

In this talk, I will describe two inferential problems that arise in the analysis of admixed genomes and describe how latent variable models provide a natural framework for inference. The first, local ancestry inference , aims to infer the ancestry at every position along an individual’s genome. I will describe latent variable models for local ancestry inference that captures the fine-scale correlation structure. We show that this model is highly accurate while being computationally efficient for genomic datasets. The second problem is focussed on inferring the genomic ancestry of the parents of an admixed individual. Here, I will introduce the framework of pooled semi-Markov processes and describe efficient inference algorithms in this framework that enable us to accurately infer the genomic ancestries of parents from their offspring’s genome.

Finally, I will show how these models are being applied to learn about genetics of complex diseases as well as to elucidate human history and mating patterns.

Tuesday, 1/24/2017, 2:00 PM—3:00 PM: A Picture of the Energy Landscape of Deep Neural Networks

Physics and Astronomy Building 1434A
Pratik Chaudhari
Department of Computer Science

Training deep networks with complex architectures requires very careful tuning of a myriad of parameters: batch-size, learning rate schedules and momentum during the optimization process as well as techniques to improve generalization, viz. \ell_2 regularization, dropout, batch-normalization, data augmentation etc. Is there a way to disentangle the effect of these techniques, understand their necessity and maybe even improve upon them? In this talk, we will focus on the complexity and geometry of the energy landscape of prototypical deep networks as a means to answer these questions. We shall first paint a picture of the complexity of the underlying optimization problem using ideas from statistical physics such as spin glasses. The geometry of the energy landscape is connected to the generalization performance of deep networks. For instance, empirically, local minima found by SGD have a large proportion of almost-zero eigenvalues in the Hessian with very few positive or negative eigenvalues. We will exploit this observation to construct an algorithm named Entropy-SGD, that maximizes a “local free energy” (partition function) and favors well-generalizable solutions in flat regions of the energy landscape while simultaneously avoiding sharp poorly-generalizable — although possibly deep — valleys. We will discuss connections of this algorithm with belief propagation and ensemble learning. Lastly, we will show experimental results on modern convolutional and recurrent neural networks that demonstrate that Entropy-SGD compares favorably to state-of-the-art techniques in terms of both generalization error and training time.
arXiv: https://arxiv.org/abs/1611.01838
Speaker’s BIO: http://vision.ucla.edu/~pratikac/

Tuesday, 1/17/2017, 2:00 PM—3:00 PM: Tractable Learning in Structured Probability Spaces

Physics and Astronomy Building 1434A
Guy Van den Broeck
Department of Computer Science

Many approaches have been introduced for learning probabilistic models, depending on various data characteristics. In this talk, I will introduce an orthogonal class of machine learning problems that we have been investigating at UCLA, which have not been treated as systematically before. In these problems, one has access to Boolean constraints that characterize examples which are known to be impossible (e.g., due to known domain physics). The task is then to learn a tractable probabilistic model that is guaranteed to assign a zero probability to each impossible example.

I will describe a new class of Arithmetic Circuits, the PSDD, for addressing this class of learning problems. The PSDD is based on advances from both machine learning and logical reasoning and can be learned under Boolean constraints. I will also provide a number of results on learning PSDDs. First, I will contrast PSDD learning with approaches that ignore known constraints, showing how it can learn more accurate models. Second, I will show that PSDDs can be utilized to learn, in a domain-independent manner, distributions over combinatorial objects, such as rankings, game traces and routes on a map. Third, I will show how PSDDs can be learned from a new type of datasets, in which examples are specified using arbitrary Boolean expressions. A number of preliminary case studies will be illustrated throughout the talk, including the unsupervised learning of preference rankings and the supervised learning of classifiers for routes and game traces.

Tuesday, 1/10/2017, 2:00 PM—3:00 PM: Flexible and Interpretable Regression Using Convex Penalties

Physics and Astronomy Building 1434A
Daniela Witten
Department of Statistics and Biostatistics
University of Washington

We consider the problem of fitting a regression model that is both flexible and interpretable. We propose two procedures for this task: the Fused Lasso Additive Model (FLAM), which is an additive model of piecewise constant fits; and Convex Regression with Interpretable Sharp Partitions (CRISP), which extends FLAM to allow for non-additivity. Both FLAM and CRISP are the solutions to convex optimization problems that can be efficiently solved. We show that FLAM and CRISP outperform competitors, such as sparse additive models (Ravikumar et al, 2009), CART (Breiman et al, 1984), and thin plate splines (Duchon, 1977), in a range of settings. We propose unbiased estimators for the degrees of freedom of FLAM and CRISP, which allow us to characterize their complexity. This is joint work with Ashley Petersen and Noah Simon at University of Washington.