Past Seminars

Tuesday, 6/06/2017, 2:00PM – 3:00PM
Scaling and Generalizing Bayesian Inference

Kinsey Pavilion 1240B
David Blei, Professor
Department of Statistics and Computer Science, Columbia University


Bayesian statistics and expressive probabilistic modeling have become key tools for the modern statistician. They let us express complex assumptions about the hidden structures that underlie our data and have been successfully applied in numerous fields.

The central computational problem in Bayesian statistics is posterior inference, the problem of approximating the conditional distribution of the hidden variables given the observations. Approximate posterior inference algorithms have revolutionized the field, revealing its potential as a usable and general-purpose language for data analysis.

Bayesian statistics, however, has not yet reached this potential. First, statisticians and scientists regularly encounter massive data sets, but existing approximate inference algorithms do not easily scale. Second, most approximate inference algorithms are not generic; each must be adapted to the specific model at hand.

In this talk I will discuss our recent research on addressing these two limitations. I will first describe stochastic variational inference, an approximate inference algorithm for handling massive data sets and I will demonstrate its application to probabilistic topic models of millions of articles. Then I will discuss black box variational inference, a generic algorithm for approximating the posterior. Black box inference easily applies to many models with little model-specific derivation and few restrictions on their properties. I will demonstrate its use on deep exponential families and describe how it enables powerful tools for probabilistic programming.

Tuesday, 5/30/2017, 2:00PM – 3:00PM
Prediction under Asymmetry: Empirical Bayes predictive methods for Check loss

Kinsey Pavilion 1240B
Gourab Mukherjee, Assistant Professor
Data Sciences and Operations, University of Southern California

A host of modern applications require prediction under asymmetric loss functions. The check loss is piecewise linear and is widely used in such applications for penalizing underestimation and overestimation in different ways. Here, we develop new empirical Bayes methods that can produce optimal prediction under agglomerative check losses in high dimensional hierarchical models. Because of the nature of this loss, our inferential target is a pre-chosen quantile of the predictive distribution rather than the mean of the predictive distribution. In common with many other problems we find that empirical Bayes shrinkage provides better performance than simple coordinate-wise rules. However, the problem here differs in fundamental respects from estimation or prediction under the quadratic losses considered in most of the previous literature. This necessitates different strategies for creation of effective empirical Bayes predictors. We develop new methods for constructing uniformly efficient asymptotic risk estimates for conditionally linear predictors. Minimizing these risk estimates we obtain an empirical Bayes prediction rule which has asymptotic optimality properties not shared by EB strategies that use maximum likelihood or method-of-moments to estimate the hyper-parameters.

Tuesday, 5/25/2017, 2:00PM – 3:00PM
Power Enhancement Tests for High Dimensional Data with Applications to Genetic and Genomic Studies

Kinsey Pavilion 1240B
Lingzhou Xue, Assistant Professor
Department of Statistics, Pennsylvania State University

In this talk, I will discuss some new insights in hypothesis tests for analysis of high-dimensional data, which are motivated by genetic and genomic studies. In the current literature, two sets of test statistics are commonly used for various high-dimensional tests: 1) using extreme-value form statistics to test against sparse alternatives, and 2) using quadratic form statistics to test against dense alternatives. However, quadratic form statistics suffer from low power against sparse alternatives, and extreme-value form statistics suffer from low power against dense alternatives with small disturbances and may have size distortions due to its slow convergence. For real-world applications, it is important to derive powerful testing procedures against more general alternatives. Based on their joint limiting laws, we introduce new power enhancement testing procedures to boost the power against more general alternatives and retain the correct asymptotic size. Under the high-dimensional setting, we derive the closed-form limiting null distributions, and obtain their explicit rates of uniform convergence. We demonstrate the performance of our proposed test statistics in numerical studies.

Wednesday, 5/24/2017, 12:00pm – 1:30pm
Predicting the Evolution of Intrastate Conflict: Evidence from Nigeria

Presented by the Center for Social Statistics.

4240 Public Affairs Building
Shahryar Minhas
Postdoctoral Fellow, Duke University
Assistant Professor, Michigan State University
Department of Political Science and the Social Science Data Analytics Program (SSDA)

The endogenous nature of civil conflict has limited scholars’ abilities to draw clear inferences about the drivers of conflict evolution. We argue that three primary features characterize the complexity of intrastate conflict: (1) the interdependent relationships of conflict between actors; (2) the impact of armed groups on violence as they enter or exit the conflict network; and (3) the ability of civilians to influence the strategic interactions of armed groups. Using ACLED event data on Nigeria, we apply a novel network-based approach to predict the evolution of intrastate conflict dynamics. Our network approach yields insights about the effects of civilian victimization and key actors entering the conflict. Attacks against civilians lead groups to both be more violent, and to become the targets of attacks in subsequent periods. Boko Haram’s entrance into the civil war leads to an increase in violence even in unrelated dyads. Further, our approach significantly outperforms more traditional dyad-group approaches at predicting the incidence of conflict.

Tuesday, 5/23/2017, 2:00PM – 3:00PM
Panning for gold: Model-free knockoffs for high-dimensional controlled variable selection

Kinsey Pavilion 1240B
Jinchi Lv, Associate Professor
Data Sciences and Operations, University of Southern California

Many contemporary large-scale applications involve building interpretable models linking a large set of potential covariates to a response in a nonlinear fashion, such as when the response is binary. Although this modeling problem has been extensively studied, it remains unclear how to effectively control the fraction of false discoveries even in high-dimensional logistic regression, not to mention general high-dimensional nonlinear models. To address such a practical problem, we propose a new framework of model-free knockoffs, which reads from a different perspective the knockoff procedure (Barber and Candès, 2015) originally designed for controlling the false discovery rate in linear models. The key innovation of our method is to construct knockoff variables probabilistically instead of geometrically. This enables model-free knockoffs to deal with arbitrary (and unknown) conditional models and any dimensions, including when the dimensionality p exceeds the sample size n, while the original knockoffs procedure is constrained to homoscedastic linear models with n ≥ p. Our approach requires the design matrix be random (independent and identically distributed rows) with a covariate distribution that is known, although we show our procedure to be robust to unknown/estimated distributions. To our knowledge, no other procedure solves the \textit{controlled} variable selection problem in such generality, but in the restricted settings where competitors exist, we demonstrate the superior power of knockoffs through simulations. Finally, we apply our procedure to data from a case-control study of Crohn’s disease in the United Kingdom, making twice as many discoveries as the original analysis of the same data. This is a joint work with Emmanuel Candès, Yingying Fan and Lucas Janson.

Tuesday, 5/16/2017, 2:00PM – 3:00PM
Topic: A New SOP for Accurate and Efficient Community Detection

Kinsey Pavilion 1240B
Frederick K. H. Phoa
Institute of Statistical Science, Academia Sinica, Taipei 115, Taiwan

Community is one of the most important features in social networks. There were many traditional methods in the literature to detect communities in network science and sociological studies, but few were able to identify the statistical significance of the detected communities. Even worse, these methods were computationally infeasible for networks with large numbers of nodes and edges. In this talk, we introduce a new SOP for detecting communities in a social network accurately and efficiently. It consists of four main steps. First, a screening stage is proposed to roughly divide the whole network into communities via complement graph coloring. Then a likelihood-based statistical test is introduced to test for the significance of the detected communities. Once these significant communities are detected, another likelihood-based statistical test is introduced to check for the focus centrality of each community. Finally, a metaheuristic swarm intelligence based (SIB) method is proposed to fine tune the range of each community from its original circular setting. Some famous networks are used as empirical data to demonstrate the process of this new SOP.

Tuesday, 05/09/2017, 2:00PM – 3:00pm
Topic: Neyman-Pearson (NP) classification algorithms and NP receiver operating characteristic (NP-ROC)

Kinsey Pavilion 1240B
Xin Tong, Assistant Professor
Department of Data Sciences and Operations, Marshall School of Business, University of Southern California

In many binary classification applications, such as disease diagnosis and spam detection, practitioners commonly face the need to limit type I error (i.e., the conditional probability of misclassifying a ‘normal’, or class $0$, observation as ‘abnormal’, or class $1$) so that it remains below a desired threshold. To address this need, the Neyman-Pearson (NP) classification paradigm is a natural choice; it minimizes type II error (i.e., the conditional probability of misclassifying a class $1$ observation as class $0$) while enforcing an upper bound, $\alpha$, on the type I error. Although the NP paradigm has a century-long history in hypothesis testing, it has not been well recognized and implemented in statistical classification schemes. Common practices that directly limit the empirical type I error to no more than $\alpha$ do not satisfy the type I error control objective because the resulting classifiers are still likely to have type I errors much larger than $\alpha$. As a result, the NP paradigm has not been properly implemented for many classification scenarios in practice. In this work, we develop the first umbrella algorithm that implements the NP paradigm for all scoring-type classification methods, including popular methods such as logistic regression, support vector machines and random forests. Powered by this umbrella algorithm, we propose a novel graphical tool for NP classification methods: NP receiver operating characteristic (NP-ROC) bands, motivated by the popular receiver operating characteristic (ROC) curves. NP-ROC bands will help choose $\alpha$ in a data adaptive way, compare different NP classifiers, and detect possible overfitting. We demonstrate the use and properties of the NP umbrella algorithm and NP-ROC bands, available in the \verb+R+ package \verb+nproc+, through simulation and real data case studies.

Thursday, 5/04/2017, 2:00PM – 3:30pm
de Leeuw Seminar: Festschrift Reloaded – The Lion Strikes Back

Location: 314 Royce Hall

Patrick Mair, Senior Lecturer in Statistics
Department of Psychology, Harvard

Katharine Mullen, Adjunct Assistant Professor
UCLA Department of Statistics

In September 2016 the Journal of Statistical Software (JSS) published a Festschrift for Jan de Leeuw, founding chair of the Department of Statistics at UCLA and founding editor of JSS. The Festschrift commemorated Jan’s retirement as well as the 20-year anniversary of JSS.  Six contributions surveyed Jan’s methodological contributions on topics such as multiway analysis, Gifi, multidimensional scaling, and other, somewhat more exotic scaling approaches. One contribution traced the development of R and other statistical software in the pages of JSS. The final paper by Don Ylvisaker looked back at the early days of the Department of Statistics at UCLA.  In this talk, the editors of the Festschrift reflect on some of the highlights presented in these contributions, discuss Jan’s role in these developments, and outline some newer research topics Jan has been working on over the last few months.

More information is available here.

Tuesday, 05/02/2017, 2:00PM – 3:00pm
Topic: Robust inference in high-dimensional models – going beyond sparsity principles

Kinsey Pavilion 1240B
Jelena Bradic, Assistant Professor of Statistics
Department of Mathematics, University of California, San Diego

In high-dimensional linear models the sparsity assumption is typically made, stating that most of the parameters are equal to zero. Under the sparsity assumption, estimation and, recently, inference have been well studied. However, in practice, sparsity assumption is not checkable and more importantly is often violated, with a large number of covariates expected to be associated with the response, indicating that possibly all, rather than just a few, parameters are non-zero. A natural example is a genome-wide gene expression profiling, where all genes are believed to affect a common disease marker. We show that existing inferential methods are sensitive to the sparsity assumption, and may, in turn, result in the severe lack of control of Type-I error. In this article, we propose a new inferential method, named CorrT, which is robust and adaptive to the sparsity assumption. CorrT is shown to have Type I error approaching the nominal level and Type II error approaching zero, regardless of how sparse or dense the model. In fact, CorrT is also shown to be optimal whenever sparsity holds. Numerical and real data experiments show a favorable performance of the CorrT test compared to the state-of-the-art methods.

Tuesday, 4/25/2017, 2:00PM – 3:00pm
Topic: Asynchronous Parallel Algorithms for Fixed-Point Problems and Large-Scale Optimization

Kinsey Pavilion 1240B
Wotao Yin, Professor
Department of Mathematics, UCLA

wotaoYINMany problems reduce to the fixed-point problem of solving x=T(x). This talk discusses a coordinate-friendly structure in the operator T that prevails in many optimization problems and enables highly efficient asynchronous algorithms. By “asynchronous”, we mean that the algorithm runs on multiple threads and each thread computes with possibly delayed information from the other threads. The threads do not coordinate their iterations. On modern computer architecture, asynchronous algorithms can be order-of-magnitude faster that the standard (synchronous) parallel algorithms! We demonstrate how to solve large-scale applications in machine learning, image processing, portfolio optimization, and second-order cone programs with asynchronous algorithms.

We also present a fundamental theoretical result: as long as T has a fixed point and is nonexpansive, the asynchronous coordinate-update algorithm converges to a fixed point under either a bounded delay or certain kinds of unbounded delays. The operator does not have to be contractive.

This is joint work with many current and former students.

Tuesday, 4/11/2017, 2:00PM – 3:00pm
Topic: Scalable Bayesian Models of Interacting Time Series

Boelter Hall 5436
Emily Fox, Amazon Professor of Machine Learning
Department of Statistics, University of Washington

Data streams of increasing complexity and scale are being collected in a variety of fields ranging from neuroscience and genomics to e-commerce. Modeling the intricate relationships between the large collection of series can lead to increased predictive performance and domain-interpretable structures. For scalability, it is crucial to discover and exploit sparse dependencies between the data streams. Such representational structures for independent data sources have been studied extensively, but have received limited attention in the context of time series. In this talk, we present a series of models for capturing such sparse dependencies via clustering, graphical models, and low-dimensional embeddings of time series. We explore these methods in a variety of applications, including house price modeling and inferring networks in the brain. We likewise discuss methods for scaling Bayesian inference to large datasets.

We then turn to observed interaction data, and briefly touch upon how to devise statistical network models that capture important network features like sparsity of edge connectivity. Within our Bayesian framework, a key insight is to move to a continuous-space representation of the graph, rather than the typical discrete adjacency matrix structure. We demonstrate our methods on a series of real-world networks with up to hundreds of thousands of nodes and millions of edges.


Emily Fox is currently the Amazon Professor of Machine Learning in the Statistics Department at the University of Washington. She received a S.B. in 2004 and Ph.D. in 2009 from the Department of Electrical Engineering and Computer Science at MIT. She has been awarded a Presidential Early Career Award for Scientists and Engineers (PECASE, 2017), Sloan Research Fellowship (2015), an ONR Young Investigator award (2015), an NSF CAREER award (2014), the Leonard J. Savage Thesis Award in Applied Methodology (2009), and the MIT EECS Jin-Au Kong Outstanding Doctoral Thesis Prize (2009). Her research interests are in large-scale Bayesian dynamic modeling and computations.

Wednesday, 3/15/2017, 1:30PM
Topic: Metareasoning and Mental Simulation
UCLA Statistics/Communication Studies Seminar

4242 Young Hall (JIFRESSE Seminar Room)
Jessica Hamrick, Ph.D. Candidate
Department of Psychology, University of California, Berkeley

At any given moment, how should an intelligent agent decide what to think about, how to think about it, and how long to think for? My research attempts to answer these questions by focusing on the best examples of intelligent agents that we have: humans. In particular, I study how people use their “mental simulations”, which can be thought of as samples from a rich generative model of the world. I show how people adaptively use their mental simulations to learn new things about the world; that they choose which simulations to run based on which they think will be more informative; and that they allocate their cognitive resources to spend less time on easy problems and more time on hard problems. Based on these results, I will illustrate how statistics, machine learning, and cognitive science can complement one another. On the one hand, I will show how ideas from cognitive science can inform and inspire new approaches in machine learning; on the other hand, I will discuss how the mathematical models of human cognition that I develop in my research can be used to build agents that are better able to reason about and communicate with human collaborators.

Tuesday, 3/14/2017, 2:00 PM—3:00 PM
Topic: Human Behavior in Techno-Social Systems

Physics and Astronomy Building 1434A
Emilio Ferrara, Research Assistant Professor
Department of Mathematics

The increasing availability of data across different socio-technical systems, such as online social networks, social media, and mobile phone networks, presents novel challenges and intriguing research opportunities. As more online services permeate through our everyday life and as data from various domains are connected and integrated with each other, the boundary between the real and the online worlds becomes blurry. Such data convey both online and offline activities of people, as well as multiple time scales and resolutions. In this talk, I’ll discuss my research efforts aimed at characterizing and predicting human behavior and activities in techno-social worlds: starting by discussing network structure and information spreading on large online social networks, I’ll move toward characterizing entire online conversations, such as those around big real-world events, to capture the dynamics driving the emergence of collective attention. I’ll describe a machine learning framework leveraging these insights to detect promoted campaigns that mimic grassroots conversation. Aiming at learning the signature of abuse at the level of the single individuals, I’ll illustrate the challenges posed by characterizing human activity as opposed to that of synthetic entities (social bots) that attempt emulate us, to persuade, smear, tamper or deceive. I’ll then explore a variety of such applications and problems to study of emotions, cognitive heuristics, etc.

Bio. Dr. Emilio Ferrara is Research Assistant Professor at the University of Southern California, Research Leader at the USC Information Sciences Institute, and Principal Investigator at the USC/ISI Machine Intelligence and Data Science (MINDS) research group. Ferrara’s research interests include modeling and predicting individual behavior in techno-social systems, characterize information diffusion, and predict crime and abuse in such environments. He has held various research positions in institutions in Italy, Austria, and UK (2009-2012). Before joining USC in 2015, he was research faculty in the School of Informatics and Computing of Indiana University (2012-2015). Ferrara holds a Ph.D. in Computer Science from University of Messina (Italy), and has published over 80 articles on social networks, machine learning, and network science, appeared in venues like Proceeding of the National Academy of Sciences, Communications of the ACM, Physical Review Letters, and several ACM and IEEE transactions and conferences. His research on social networks has been featured on the major news outlets (TIME, BBC, The New York Times, etc.) and tech magazines (MIT Technology Review, Vice, Mashable, New Scientist, etc). He was named IBM Watson Big Data Influencer in 2015, and Maptive’s Big Data Expert to Follow in 2016. He received the 2016 DARPA Young Faculty Award, and the 2016 Complex Systems Society Junior Scientific Award “For outstanding contributions to complex systems sciences.” His research is supported by DARPA, IARPA, Air Force, and Office of Naval Research.

Friday, 3/10/2017, 1:30PM
Topic: Visual Roots of Social Cognition
UCLA Statistics/Communication Studies Seminar

4242 Young Hall (JIFRESSE Seminar Room)
Tao Gao, Research Scientist
Massachusetts Institute of Technology, Computer Vision Lab, GE Research

Intelligent human communication relies on our remarkable abilities of understanding others’ mental states, including beliefs, desires, goals and intentions. As a case study of “Theory of Mind” (ToM), I have been exploring the perceptual roots of social cognition — focusing on “reverse-engineering” human abilities to recognize animacy, agency, and intentionality in visual scenes. My work in this area touches on themes from cognitive science, computer vision, social psychology, and neuroscience. In my talk, I will describe several different kinds of studies that contribute to this overall research project, including explorations of (1) just what kinds of cues trigger this form of perception; (2) how the perception of these social properties influences subsequent cognition and action; and (3) how to build models of social perception through computer vision and machine learning. Collectively, these projects show how the perception of animacy and intentionality is deeply wired into our minds, how it can be modeled for higher-level social communication and collaboration.

Wednesday, 3/08/2017, 1:30PM
Topic: Towards Natural, Intelligent, and Collaborative Human-Agent Communication
UCLA Statistics/Communication Studies Seminar

4242 Young Hall (JIFRESSE Seminar Room)
Joyce Chai, Professor
Department of Computer Science and Engineering, Michigan State University

Enabling situated communication with artificial agents (e.g., robots) faces many challenges. Humans and artificial agents have different levels of linguistic, task, and world knowledge. In situated communication with robots, although humans and robots are co-present in a shared physical environment, they have mismatched capabilities in perceiving and interpreting the shared environment and joint tasks. All of these significantly jeopardize the common ground between partners, making language-based communication extremely difficult. To address this problem, we have developed several computational collaborative models for language processing which are motivated by cooperative principles in human communication. We have also developed approaches to allow artificial agents to continuously acquire knowledge through communicating with humans to establish common ground.  In this talk, I will give an introduction to this research effort and particularly focus on referential communication and interactive verb acquisition and action learning.

Tuesday, 3/7/2017, 2:00 PM—3:00 PM
Topic: Exploration of Large Networks Via Latent Space Modeling

Physics and Astronomy Building 1434A
Zongming Ma
Department of Statistics
University of Pennsylvania

Latent space models are effective tools for statistical network data analysis. The present paper presents two fitting algorithms for a broad class of latent space models that can effectively model network characteristics such as degree heterogeneity, transitivity, homophily, etc. The methods are motivated by inner-product models, but are readily applicable to more general models that allow latent vectors to affect edge formation in flexible ways. Both methods are scalable to very large networks and have fast rates of convergence. The effectiveness of the modeling approach and fitting methods is demonstrated on a number of simulated and real world network datasets.

Friday, 3/3/2017, 11:00AM
Topic: Geometry of Neural Networks
UCLA Statistics/Mathematics Seminar

Faculty Center – Cypress Room
Guido Montufar, Postdoctoral Scientist
Max Planck Institute for Mathematics in the Sciences & Leipzig University

Deep Learning is one of the most successful machine learning approaches to artificial intelligence. In this talk I discuss the geometry of neural networks as a way to study the success of Deep Learning at a mathematical level and to develop a theoretical basis for making further advances, especially in situations with limited amounts of labeled data and challenging problems in reinforcement learning. I present some recent results on the representational power of neural networks. Then I demonstrate how the capacity of a neural network can be aligned with the structure of perception-action problems in order to obtain more efficient learning systems.

Thursday, 3/2/2017, 1:30 PM
Topic: Computational and Statistical Convergence for Graph Estimation: Just Relax
UCLA Statistics/Mathematics Seminar

Faculty Center – Cypress Room
Shuheng Zhou, Assistant Professor
Department of Statistics
University of Michigan

The general theme of my research in recent years is spatio-temporal modeling and sparse recovery with high dimensional data under measurement error. In this talk, I will discuss several computational and statistical convergence results on graph and sparse vector recovery problems. Our analysis reveals interesting connections between computational and statistical efficiency and the concentration of measure phenomenon in random matrix theory. Our methods are applicable to many application domains such as neuroscience, geoscience and spatio-temporal modeling, genomics, and network data analysis. I will present theory, simulation and data examples. Part of this talk is based on joint work with Mark Rudelson.

Tuesday, 2/28/2017, 2:00 PM—3:00 PM
Topic: Convexified Modularity Maximization for Degree-Corrected Stochastic Block Models

Physics and Astronomy Building 1434A
Xiaodong Li
Department of Statistics
UC Davis

The stochastic block model (SBM) is a popular framework for studying community detection in networks. This model is limited by the assumption that all nodes in the same community are statistically equivalent and have equal expected degrees. The degree-corrected stochastic block model (DCSBM) is a natural extension of SBM that allows for degree heterogeneity within communities. This paper proposes a convexified modularity maximization approach for estimating the hidden communities under DCSBM. This approach is based on a convex programming relaxation of the classical (generalized) modularity maximization formulation, followed by a novel doubly-weighted `1-norm k-median procedure. In view of a novel degree-corrected density gap condition, we establish non-asymptotic theoretical guarantees for both approximate clustering and perfect clustering. In the special case of SBM, these theoretical results match the best- known performance guarantees of computationally feasible algorithms. Numerically, we provide an efficient implementation of our algorithm, which is applied to both synthetic and real-world networks. Experiment results show that our method enjoys competitive empirical performance compared to the state-of-the-art tractable methods in the literature. This is a joint work with Yudong Chen and Jiaming Xu.

Friday, 2/24/2017, 11:00 AM
Combinatorial Inference
UCLA Statistics/Mathematics Seminar

Faculty Center, Cypress Room
Han Liu
Assistant Professor
Princeton University, Department of Operations Research and Financial Engineering

We propose a new family of combinatorial inference problems for graphical models. Unlike classical statistical inference where the main interest is point estimation or parameter testing of Euclidean parameters, combinatorial inference aims at testing the global structure of the underlying graph. Examples include testing the graph connectivity, the presence of a cycle of certain size, or the maximum degree of the graph. To begin with, we develop a unified theory for the fundamental limits of a large family of combinatorial inference problems. We propose new structural packing entropies to characterize how the complexity of combinatorial graph structures impacts the corresponding minimax lower bounds. On the other hand, we propose a family of practical structural testing algorithms to match the obtained lower bounds. We use a case study of brain network analysis to illustrate the usefulness of these proposed methods.

Thursday, 2/23/2017, 1:30 PM
Topic: ePCA: Exponential family PCA
UCLA Statistics/Mathematics Seminar

Faculty Center, Cypress Room
Edgar Dobriban
PhD Candidate
Stanford University, Department of Statistics

Many applications, such as photon-limited imaging and genomics, involve large datasets with entries from exponential family distributions. It is of interest to estimate the covariance structure and principal components of the noiseless distribution. Principal Component Analysis (PCA), the standard method for this setting, can be inefficient for non-Gaussian noise. In this talk we present ePCA, a methodology for PCA on exponential family distributions. ePCA involves the eigendecomposition of a new covariance matrix estimator, constructed in a deterministic non-iterative way using moment calculations, shrinkage, and random matrix theory. We provide several theoretical justifications for our estimator, including the Marchenko-Pastur law in high dimensions. We illustrate ePCA by denoising molecular diffraction maps obtained using photon-limited X-ray free electron laser (XFEL) imaging. This is joint work with Lydia T. Liu and Amit Singer.

Tuesday, 2/21/2017, 2:00 PM—3:00 PM
A New Theory of Exploration in Reinforcement Learning with Function Approximation

Physics and Astronomy Building 1434A
Alekh Agarwal
Microsoft Research

This talk considers a core question in reinforcement learning (RL): How can we tractably solve sequential decision making problems where the learning agent receives rich observations?

We begin with a new model called Contextual Decision Processes (CDPs) for studying such problems, and show that it encompasses several prior setups to study RL such as MDPs and POMDPs. Several special cases of CDPs are, however, known to be provably intractable in their sample complexities. To overcome this challenge, we further propose a structural property of such processes, called the Bellman Rank. We find that the Bellman Rank of a CDP (and an associated class of functions) provides an intuitive measure of the hardness of a problem in terms of sample complexity—that is the number of samples needed by an agent to discover a near optimal policy for the CDP. In particular, we propose an algorithm, whose sample complexity scales with the Bellman Rank of the process, and is completely independent of the size of the observation space of the agent unlike most prior results. We also show that our techniques are robust to our modeling assumptions, and make connections to several known results as well as highlight novel consequences of our results.

This talk is based on joint work with Nan Jiang, Akshay Krishnamurthy, John Langford and Rob Schapire.

Tuesday, 2/14/2017, 2:00 PM—3:00 PM
Random Matrices with Heavy-Tailed Entries: Tight Mean Estimators and Applications

Physics and Astronomy Building 1434A
Stanislav Minsker
Department of Mathematics

Estimation of the covariance matrix has attracted significant attention of the statistical research community over the years, partially due to important applications such as Principal Component Analysis. However, frequently used empirical covariance estimator (and its modifications) is very sensitive to outliers in the data. As P. Huber wrote in 1964, “…This raises a question which could have been asked already by Gauss, but which was, as far as I know, only raised a few years ago (notably by Tukey): what happens if the true distribution deviates slightly from the assumed normal one? As is now well known, the sample mean then may have a catastrophically bad performance…” Motivated by this question, we develop a new estimator of the (element-wise) mean of a random matrix, which includes covariance estimation problem as a special case. Assuming that the entries of a matrix possess only finite second moment, this new estimator admits sub-Gaussian or sub-exponential concentration around the unknown mean in the operator norm. We will present extensions of our approach to matrix-valued U-statistics, as well as applications to covariance estimation and matrix completion problems.

Part of the talk will be based on a joint work with Xiaohan Wei.

Tuesday, 2/7/2017, 2:00 PM—3:00 PM
Matrix Completion, Saddlepoints, and Gradient Descent

Physics and Astronomy Building 1434A
Jason Lee
Data Sciences and Operations Department

Matrix completion is a fundamental machine learning problem with wide applications in collaborative filtering and recommender systems. Typically, matrix completion are solved by non-convex optimization procedures, which are empirically extremely successful. We prove that the symmetric matrix completion problem has no spurious local minima, meaning all local minima are also global. Thus the matrix completion objective has only saddlepoints an global minima.

Next, we show that saddlepoints are easy to avoid for even Gradient Descent — arguably the simplest optimization procedure. We prove that with probability 1, randomly initialized Gradient Descent converges to a local minimizer. The same result holds for a large class of optimization algorithms including proximal point, mirror descent, and coordinate descent.

Tuesday, 1/31/2017, 2:00 PM—3:00 PM
Latent Variable Models for Inferring Structure in Genomic Data

Physics and Astronomy Building 1434A
Sriram Sankararaman
Department of Computer Science

Genomic analyses have revealed that admixture, a process in which populations mix and exchange genes, has been a common process through human evolution. Understanding the genomic ancestry of an admixed individual is an important step in genomic medicine, social science and evolutionary biology. To this end, we need to develop statistical models that are detailed enough to capture the key biological signals while permitting efficient inference across large-scale genomic datasets.

In this talk, I will describe two inferential problems that arise in the analysis of admixed genomes and describe how latent variable models provide a natural framework for inference. The first, local ancestry inference , aims to infer the ancestry at every position along an individual’s genome. I will describe latent variable models for local ancestry inference that captures the fine-scale correlation structure. We show that this model is highly accurate while being computationally efficient for genomic datasets. The second problem is focussed on inferring the genomic ancestry of the parents of an admixed individual. Here, I will introduce the framework of pooled semi-Markov processes and describe efficient inference algorithms in this framework that enable us to accurately infer the genomic ancestries of parents from their offspring’s genome.

Finally, I will show how these models are being applied to learn about genetics of complex diseases as well as to elucidate human history and mating patterns.

Tuesday, 1/24/2017, 2:00 PM—3:00 PM: A Picture of the Energy Landscape of Deep Neural Networks

Physics and Astronomy Building 1434A
Pratik Chaudhari
Department of Computer Science

Training deep networks with complex architectures requires very careful tuning of a myriad of parameters: batch-size, learning rate schedules and momentum during the optimization process as well as techniques to improve generalization, viz. \ell_2 regularization, dropout, batch-normalization, data augmentation etc. Is there a way to disentangle the effect of these techniques, understand their necessity and maybe even improve upon them? In this talk, we will focus on the complexity and geometry of the energy landscape of prototypical deep networks as a means to answer these questions. We shall first paint a picture of the complexity of the underlying optimization problem using ideas from statistical physics such as spin glasses. The geometry of the energy landscape is connected to the generalization performance of deep networks. For instance, empirically, local minima found by SGD have a large proportion of almost-zero eigenvalues in the Hessian with very few positive or negative eigenvalues. We will exploit this observation to construct an algorithm named Entropy-SGD, that maximizes a “local free energy” (partition function) and favors well-generalizable solutions in flat regions of the energy landscape while simultaneously avoiding sharp poorly-generalizable — although possibly deep — valleys. We will discuss connections of this algorithm with belief propagation and ensemble learning. Lastly, we will show experimental results on modern convolutional and recurrent neural networks that demonstrate that Entropy-SGD compares favorably to state-of-the-art techniques in terms of both generalization error and training time.
Speaker’s BIO:

Tuesday, 1/17/2017, 2:00 PM—3:00 PM: Tractable Learning in Structured Probability Spaces

Physics and Astronomy Building 1434A
Guy Van den Broeck
Department of Computer Science

Many approaches have been introduced for learning probabilistic models, depending on various data characteristics. In this talk, I will introduce an orthogonal class of machine learning problems that we have been investigating at UCLA, which have not been treated as systematically before. In these problems, one has access to Boolean constraints that characterize examples which are known to be impossible (e.g., due to known domain physics). The task is then to learn a tractable probabilistic model that is guaranteed to assign a zero probability to each impossible example.

I will describe a new class of Arithmetic Circuits, the PSDD, for addressing this class of learning problems. The PSDD is based on advances from both machine learning and logical reasoning and can be learned under Boolean constraints. I will also provide a number of results on learning PSDDs. First, I will contrast PSDD learning with approaches that ignore known constraints, showing how it can learn more accurate models. Second, I will show that PSDDs can be utilized to learn, in a domain-independent manner, distributions over combinatorial objects, such as rankings, game traces and routes on a map. Third, I will show how PSDDs can be learned from a new type of datasets, in which examples are specified using arbitrary Boolean expressions. A number of preliminary case studies will be illustrated throughout the talk, including the unsupervised learning of preference rankings and the supervised learning of classifiers for routes and game traces.

Tuesday, 1/10/2017, 2:00 PM—3:00 PM: Flexible and Interpretable Regression Using Convex Penalties

Physics and Astronomy Building 1434A
Daniela Witten
Department of Statistics and Biostatistics
University of Washington

We consider the problem of fitting a regression model that is both flexible and interpretable. We propose two procedures for this task: the Fused Lasso Additive Model (FLAM), which is an additive model of piecewise constant fits; and Convex Regression with Interpretable Sharp Partitions (CRISP), which extends FLAM to allow for non-additivity. Both FLAM and CRISP are the solutions to convex optimization problems that can be efficiently solved. We show that FLAM and CRISP outperform competitors, such as sparse additive models (Ravikumar et al, 2009), CART (Breiman et al, 1984), and thin plate splines (Duchon, 1977), in a range of settings. We propose unbiased estimators for the degrees of freedom of FLAM and CRISP, which allow us to characterize their complexity. This is joint work with Ashley Petersen and Noah Simon at University of Washington.