Past Seminars

Tuesday, 4/25/2017, 2:00PM – 3:00pm
Topic: Asynchronous Parallel Algorithms for Fixed-Point Problems and Large-Scale Optimization

Kinsey Pavilion 1240B
Wotao Yin, Professor
Department of Mathematics, UCLA

wotaoYINMany problems reduce to the fixed-point problem of solving x=T(x). This talk discusses a coordinate-friendly structure in the operator T that prevails in many optimization problems and enables highly efficient asynchronous algorithms. By “asynchronous”, we mean that the algorithm runs on multiple threads and each thread computes with possibly delayed information from the other threads. The threads do not coordinate their iterations. On modern computer architecture, asynchronous algorithms can be order-of-magnitude faster that the standard (synchronous) parallel algorithms! We demonstrate how to solve large-scale applications in machine learning, image processing, portfolio optimization, and second-order cone programs with asynchronous algorithms.

We also present a fundamental theoretical result: as long as T has a fixed point and is nonexpansive, the asynchronous coordinate-update algorithm converges to a fixed point under either a bounded delay or certain kinds of unbounded delays. The operator does not have to be contractive.

This is joint work with many current and former students.

Tuesday, 4/11/2017, 2:00PM – 3:00pm
Topic: Scalable Bayesian Models of Interacting Time Series

Boelter Hall 5436
Emily Fox, Amazon Professor of Machine Learning
Department of Statistics, University of Washington

Data streams of increasing complexity and scale are being collected in a variety of fields ranging from neuroscience and genomics to e-commerce. Modeling the intricate relationships between the large collection of series can lead to increased predictive performance and domain-interpretable structures. For scalability, it is crucial to discover and exploit sparse dependencies between the data streams. Such representational structures for independent data sources have been studied extensively, but have received limited attention in the context of time series. In this talk, we present a series of models for capturing such sparse dependencies via clustering, graphical models, and low-dimensional embeddings of time series. We explore these methods in a variety of applications, including house price modeling and inferring networks in the brain. We likewise discuss methods for scaling Bayesian inference to large datasets.

We then turn to observed interaction data, and briefly touch upon how to devise statistical network models that capture important network features like sparsity of edge connectivity. Within our Bayesian framework, a key insight is to move to a continuous-space representation of the graph, rather than the typical discrete adjacency matrix structure. We demonstrate our methods on a series of real-world networks with up to hundreds of thousands of nodes and millions of edges.


Emily Fox is currently the Amazon Professor of Machine Learning in the Statistics Department at the University of Washington. She received a S.B. in 2004 and Ph.D. in 2009 from the Department of Electrical Engineering and Computer Science at MIT. She has been awarded a Presidential Early Career Award for Scientists and Engineers (PECASE, 2017), Sloan Research Fellowship (2015), an ONR Young Investigator award (2015), an NSF CAREER award (2014), the Leonard J. Savage Thesis Award in Applied Methodology (2009), and the MIT EECS Jin-Au Kong Outstanding Doctoral Thesis Prize (2009). Her research interests are in large-scale Bayesian dynamic modeling and computations.

Wednesday, 3/15/2017, 1:30PM
Topic: Metareasoning and Mental Simulation
UCLA Statistics/Communication Studies Seminar

4242 Young Hall (JIFRESSE Seminar Room)
Jessica Hamrick, Ph.D. Candidate
Department of Psychology, University of California, Berkeley

At any given moment, how should an intelligent agent decide what to think about, how to think about it, and how long to think for? My research attempts to answer these questions by focusing on the best examples of intelligent agents that we have: humans. In particular, I study how people use their “mental simulations”, which can be thought of as samples from a rich generative model of the world. I show how people adaptively use their mental simulations to learn new things about the world; that they choose which simulations to run based on which they think will be more informative; and that they allocate their cognitive resources to spend less time on easy problems and more time on hard problems. Based on these results, I will illustrate how statistics, machine learning, and cognitive science can complement one another. On the one hand, I will show how ideas from cognitive science can inform and inspire new approaches in machine learning; on the other hand, I will discuss how the mathematical models of human cognition that I develop in my research can be used to build agents that are better able to reason about and communicate with human collaborators.

Tuesday, 3/14/2017, 2:00 PM—3:00 PM
Topic: Human Behavior in Techno-Social Systems

Physics and Astronomy Building 1434A
Emilio Ferrara, Research Assistant Professor
Department of Mathematics

The increasing availability of data across different socio-technical systems, such as online social networks, social media, and mobile phone networks, presents novel challenges and intriguing research opportunities. As more online services permeate through our everyday life and as data from various domains are connected and integrated with each other, the boundary between the real and the online worlds becomes blurry. Such data convey both online and offline activities of people, as well as multiple time scales and resolutions. In this talk, I’ll discuss my research efforts aimed at characterizing and predicting human behavior and activities in techno-social worlds: starting by discussing network structure and information spreading on large online social networks, I’ll move toward characterizing entire online conversations, such as those around big real-world events, to capture the dynamics driving the emergence of collective attention. I’ll describe a machine learning framework leveraging these insights to detect promoted campaigns that mimic grassroots conversation. Aiming at learning the signature of abuse at the level of the single individuals, I’ll illustrate the challenges posed by characterizing human activity as opposed to that of synthetic entities (social bots) that attempt emulate us, to persuade, smear, tamper or deceive. I’ll then explore a variety of such applications and problems to study of emotions, cognitive heuristics, etc.

Bio. Dr. Emilio Ferrara is Research Assistant Professor at the University of Southern California, Research Leader at the USC Information Sciences Institute, and Principal Investigator at the USC/ISI Machine Intelligence and Data Science (MINDS) research group. Ferrara’s research interests include modeling and predicting individual behavior in techno-social systems, characterize information diffusion, and predict crime and abuse in such environments. He has held various research positions in institutions in Italy, Austria, and UK (2009-2012). Before joining USC in 2015, he was research faculty in the School of Informatics and Computing of Indiana University (2012-2015). Ferrara holds a Ph.D. in Computer Science from University of Messina (Italy), and has published over 80 articles on social networks, machine learning, and network science, appeared in venues like Proceeding of the National Academy of Sciences, Communications of the ACM, Physical Review Letters, and several ACM and IEEE transactions and conferences. His research on social networks has been featured on the major news outlets (TIME, BBC, The New York Times, etc.) and tech magazines (MIT Technology Review, Vice, Mashable, New Scientist, etc). He was named IBM Watson Big Data Influencer in 2015, and Maptive’s Big Data Expert to Follow in 2016. He received the 2016 DARPA Young Faculty Award, and the 2016 Complex Systems Society Junior Scientific Award “For outstanding contributions to complex systems sciences.” His research is supported by DARPA, IARPA, Air Force, and Office of Naval Research.

Friday, 3/10/2017, 1:30PM
Topic: Visual Roots of Social Cognition
UCLA Statistics/Communication Studies Seminar

4242 Young Hall (JIFRESSE Seminar Room)
Tao Gao, Research Scientist
Massachusetts Institute of Technology, Computer Vision Lab, GE Research

Intelligent human communication relies on our remarkable abilities of understanding others’ mental states, including beliefs, desires, goals and intentions. As a case study of “Theory of Mind” (ToM), I have been exploring the perceptual roots of social cognition — focusing on “reverse-engineering” human abilities to recognize animacy, agency, and intentionality in visual scenes. My work in this area touches on themes from cognitive science, computer vision, social psychology, and neuroscience. In my talk, I will describe several different kinds of studies that contribute to this overall research project, including explorations of (1) just what kinds of cues trigger this form of perception; (2) how the perception of these social properties influences subsequent cognition and action; and (3) how to build models of social perception through computer vision and machine learning. Collectively, these projects show how the perception of animacy and intentionality is deeply wired into our minds, how it can be modeled for higher-level social communication and collaboration.

Wednesday, 3/08/2017, 1:30PM
Topic: Towards Natural, Intelligent, and Collaborative Human-Agent Communication
UCLA Statistics/Communication Studies Seminar

4242 Young Hall (JIFRESSE Seminar Room)
Joyce Chai, Professor
Department of Computer Science and Engineering, Michigan State University

Enabling situated communication with artificial agents (e.g., robots) faces many challenges. Humans and artificial agents have different levels of linguistic, task, and world knowledge. In situated communication with robots, although humans and robots are co-present in a shared physical environment, they have mismatched capabilities in perceiving and interpreting the shared environment and joint tasks. All of these significantly jeopardize the common ground between partners, making language-based communication extremely difficult. To address this problem, we have developed several computational collaborative models for language processing which are motivated by cooperative principles in human communication. We have also developed approaches to allow artificial agents to continuously acquire knowledge through communicating with humans to establish common ground.  In this talk, I will give an introduction to this research effort and particularly focus on referential communication and interactive verb acquisition and action learning.

Tuesday, 3/7/2017, 2:00 PM—3:00 PM
Topic: Exploration of Large Networks Via Latent Space Modeling

Physics and Astronomy Building 1434A
Zongming Ma
Department of Statistics
University of Pennsylvania

Latent space models are effective tools for statistical network data analysis. The present paper presents two fitting algorithms for a broad class of latent space models that can effectively model network characteristics such as degree heterogeneity, transitivity, homophily, etc. The methods are motivated by inner-product models, but are readily applicable to more general models that allow latent vectors to affect edge formation in flexible ways. Both methods are scalable to very large networks and have fast rates of convergence. The effectiveness of the modeling approach and fitting methods is demonstrated on a number of simulated and real world network datasets.

Friday, 3/3/2017, 11:00AM
Topic: Geometry of Neural Networks
UCLA Statistics/Mathematics Seminar

Faculty Center – Cypress Room
Guido Montufar, Postdoctoral Scientist
Max Planck Institute for Mathematics in the Sciences & Leipzig University

Deep Learning is one of the most successful machine learning approaches to artificial intelligence. In this talk I discuss the geometry of neural networks as a way to study the success of Deep Learning at a mathematical level and to develop a theoretical basis for making further advances, especially in situations with limited amounts of labeled data and challenging problems in reinforcement learning. I present some recent results on the representational power of neural networks. Then I demonstrate how the capacity of a neural network can be aligned with the structure of perception-action problems in order to obtain more efficient learning systems.

Thursday, 3/2/2017, 1:30 PM
Topic: Computational and Statistical Convergence for Graph Estimation: Just Relax
UCLA Statistics/Mathematics Seminar

Faculty Center – Cypress Room
Shuheng Zhou, Assistant Professor
Department of Statistics
University of Michigan

The general theme of my research in recent years is spatio-temporal modeling and sparse recovery with high dimensional data under measurement error. In this talk, I will discuss several computational and statistical convergence results on graph and sparse vector recovery problems. Our analysis reveals interesting connections between computational and statistical efficiency and the concentration of measure phenomenon in random matrix theory. Our methods are applicable to many application domains such as neuroscience, geoscience and spatio-temporal modeling, genomics, and network data analysis. I will present theory, simulation and data examples. Part of this talk is based on joint work with Mark Rudelson.

Tuesday, 2/28/2017, 2:00 PM—3:00 PM
Topic: Convexified Modularity Maximization for Degree-Corrected Stochastic Block Models

Physics and Astronomy Building 1434A
Xiaodong Li
Department of Statistics
UC Davis

The stochastic block model (SBM) is a popular framework for studying community detection in networks. This model is limited by the assumption that all nodes in the same community are statistically equivalent and have equal expected degrees. The degree-corrected stochastic block model (DCSBM) is a natural extension of SBM that allows for degree heterogeneity within communities. This paper proposes a convexified modularity maximization approach for estimating the hidden communities under DCSBM. This approach is based on a convex programming relaxation of the classical (generalized) modularity maximization formulation, followed by a novel doubly-weighted `1-norm k-median procedure. In view of a novel degree-corrected density gap condition, we establish non-asymptotic theoretical guarantees for both approximate clustering and perfect clustering. In the special case of SBM, these theoretical results match the best- known performance guarantees of computationally feasible algorithms. Numerically, we provide an efficient implementation of our algorithm, which is applied to both synthetic and real-world networks. Experiment results show that our method enjoys competitive empirical performance compared to the state-of-the-art tractable methods in the literature. This is a joint work with Yudong Chen and Jiaming Xu.

Friday, 2/24/2017, 11:00 AM
Combinatorial Inference
UCLA Statistics/Mathematics Seminar

Faculty Center, Cypress Room
Han Liu
Assistant Professor
Princeton University, Department of Operations Research and Financial Engineering

We propose a new family of combinatorial inference problems for graphical models. Unlike classical statistical inference where the main interest is point estimation or parameter testing of Euclidean parameters, combinatorial inference aims at testing the global structure of the underlying graph. Examples include testing the graph connectivity, the presence of a cycle of certain size, or the maximum degree of the graph. To begin with, we develop a unified theory for the fundamental limits of a large family of combinatorial inference problems. We propose new structural packing entropies to characterize how the complexity of combinatorial graph structures impacts the corresponding minimax lower bounds. On the other hand, we propose a family of practical structural testing algorithms to match the obtained lower bounds. We use a case study of brain network analysis to illustrate the usefulness of these proposed methods.

Thursday, 2/23/2017, 1:30 PM
Topic: ePCA: Exponential family PCA
UCLA Statistics/Mathematics Seminar

Faculty Center, Cypress Room
Edgar Dobriban
PhD Candidate
Stanford University, Department of Statistics

Many applications, such as photon-limited imaging and genomics, involve large datasets with entries from exponential family distributions. It is of interest to estimate the covariance structure and principal components of the noiseless distribution. Principal Component Analysis (PCA), the standard method for this setting, can be inefficient for non-Gaussian noise. In this talk we present ePCA, a methodology for PCA on exponential family distributions. ePCA involves the eigendecomposition of a new covariance matrix estimator, constructed in a deterministic non-iterative way using moment calculations, shrinkage, and random matrix theory. We provide several theoretical justifications for our estimator, including the Marchenko-Pastur law in high dimensions. We illustrate ePCA by denoising molecular diffraction maps obtained using photon-limited X-ray free electron laser (XFEL) imaging. This is joint work with Lydia T. Liu and Amit Singer.

Tuesday, 2/21/2017, 2:00 PM—3:00 PM
Matrix Completion, Saddlepoints, and Gradient Descent

Physics and Astronomy Building 1434A
Alekh Agarwal
Microsoft Research

This talk considers a core question in reinforcement learning (RL): How can we tractably solve sequential decision making problems where the learning agent receives rich observations?

We begin with a new model called Contextual Decision Processes (CDPs) for studying such problems, and show that it encompasses several prior setups to study RL such as MDPs and POMDPs. Several special cases of CDPs are, however, known to be provably intractable in their sample complexities. To overcome this challenge, we further propose a structural property of such processes, called the Bellman Rank. We find that the Bellman Rank of a CDP (and an associated class of functions) provides an intuitive measure of the hardness of a problem in terms of sample complexity—that is the number of samples needed by an agent to discover a near optimal policy for the CDP. In particular, we propose an algorithm, whose sample complexity scales with the Bellman Rank of the process, and is completely independent of the size of the observation space of the agent unlike most prior results. We also show that our techniques are robust to our modeling assumptions, and make connections to several known results as well as highlight novel consequences of our results.

This talk is based on joint work with Nan Jiang, Akshay Krishnamurthy, John Langford and Rob Schapire.

Tuesday, 2/14/2017, 2:00 PM—3:00 PM
Random Matrices with Heavy-Tailed Entries: Tight Mean Estimators and Applications

Physics and Astronomy Building 1434A
Stanislav Minsker
Department of Mathematics

Estimation of the covariance matrix has attracted significant attention of the statistical research community over the years, partially due to important applications such as Principal Component Analysis. However, frequently used empirical covariance estimator (and its modifications) is very sensitive to outliers in the data. As P. Huber wrote in 1964, “…This raises a question which could have been asked already by Gauss, but which was, as far as I know, only raised a few years ago (notably by Tukey): what happens if the true distribution deviates slightly from the assumed normal one? As is now well known, the sample mean then may have a catastrophically bad performance…” Motivated by this question, we develop a new estimator of the (element-wise) mean of a random matrix, which includes covariance estimation problem as a special case. Assuming that the entries of a matrix possess only finite second moment, this new estimator admits sub-Gaussian or sub-exponential concentration around the unknown mean in the operator norm. We will present extensions of our approach to matrix-valued U-statistics, as well as applications to covariance estimation and matrix completion problems.

Part of the talk will be based on a joint work with Xiaohan Wei.

Tuesday, 2/7/2017, 2:00 PM—3:00 PM
Matrix Completion, Saddlepoints, and Gradient Descent

Physics and Astronomy Building 1434A
Jason Lee
Data Sciences and Operations Department

Matrix completion is a fundamental machine learning problem with wide applications in collaborative filtering and recommender systems. Typically, matrix completion are solved by non-convex optimization procedures, which are empirically extremely successful. We prove that the symmetric matrix completion problem has no spurious local minima, meaning all local minima are also global. Thus the matrix completion objective has only saddlepoints an global minima.

Next, we show that saddlepoints are easy to avoid for even Gradient Descent — arguably the simplest optimization procedure. We prove that with probability 1, randomly initialized Gradient Descent converges to a local minimizer. The same result holds for a large class of optimization algorithms including proximal point, mirror descent, and coordinate descent.

Tuesday, 1/31/2017, 2:00 PM—3:00 PM
Latent Variable Models for Inferring Structure in Genomic Data

Physics and Astronomy Building 1434A
Sriram Sankararaman
Department of Computer Science

Genomic analyses have revealed that admixture, a process in which populations mix and exchange genes, has been a common process through human evolution. Understanding the genomic ancestry of an admixed individual is an important step in genomic medicine, social science and evolutionary biology. To this end, we need to develop statistical models that are detailed enough to capture the key biological signals while permitting efficient inference across large-scale genomic datasets.

In this talk, I will describe two inferential problems that arise in the analysis of admixed genomes and describe how latent variable models provide a natural framework for inference. The first, local ancestry inference , aims to infer the ancestry at every position along an individual’s genome. I will describe latent variable models for local ancestry inference that captures the fine-scale correlation structure. We show that this model is highly accurate while being computationally efficient for genomic datasets. The second problem is focussed on inferring the genomic ancestry of the parents of an admixed individual. Here, I will introduce the framework of pooled semi-Markov processes and describe efficient inference algorithms in this framework that enable us to accurately infer the genomic ancestries of parents from their offspring’s genome.

Finally, I will show how these models are being applied to learn about genetics of complex diseases as well as to elucidate human history and mating patterns.

Tuesday, 1/24/2017, 2:00 PM—3:00 PM: A Picture of the Energy Landscape of Deep Neural Networks

Physics and Astronomy Building 1434A
Pratik Chaudhari
Department of Computer Science

Training deep networks with complex architectures requires very careful tuning of a myriad of parameters: batch-size, learning rate schedules and momentum during the optimization process as well as techniques to improve generalization, viz. \ell_2 regularization, dropout, batch-normalization, data augmentation etc. Is there a way to disentangle the effect of these techniques, understand their necessity and maybe even improve upon them? In this talk, we will focus on the complexity and geometry of the energy landscape of prototypical deep networks as a means to answer these questions. We shall first paint a picture of the complexity of the underlying optimization problem using ideas from statistical physics such as spin glasses. The geometry of the energy landscape is connected to the generalization performance of deep networks. For instance, empirically, local minima found by SGD have a large proportion of almost-zero eigenvalues in the Hessian with very few positive or negative eigenvalues. We will exploit this observation to construct an algorithm named Entropy-SGD, that maximizes a “local free energy” (partition function) and favors well-generalizable solutions in flat regions of the energy landscape while simultaneously avoiding sharp poorly-generalizable — although possibly deep — valleys. We will discuss connections of this algorithm with belief propagation and ensemble learning. Lastly, we will show experimental results on modern convolutional and recurrent neural networks that demonstrate that Entropy-SGD compares favorably to state-of-the-art techniques in terms of both generalization error and training time.
Speaker’s BIO:

Tuesday, 1/17/2017, 2:00 PM—3:00 PM: Tractable Learning in Structured Probability Spaces

Physics and Astronomy Building 1434A
Guy Van den Broeck
Department of Computer Science

Many approaches have been introduced for learning probabilistic models, depending on various data characteristics. In this talk, I will introduce an orthogonal class of machine learning problems that we have been investigating at UCLA, which have not been treated as systematically before. In these problems, one has access to Boolean constraints that characterize examples which are known to be impossible (e.g., due to known domain physics). The task is then to learn a tractable probabilistic model that is guaranteed to assign a zero probability to each impossible example.

I will describe a new class of Arithmetic Circuits, the PSDD, for addressing this class of learning problems. The PSDD is based on advances from both machine learning and logical reasoning and can be learned under Boolean constraints. I will also provide a number of results on learning PSDDs. First, I will contrast PSDD learning with approaches that ignore known constraints, showing how it can learn more accurate models. Second, I will show that PSDDs can be utilized to learn, in a domain-independent manner, distributions over combinatorial objects, such as rankings, game traces and routes on a map. Third, I will show how PSDDs can be learned from a new type of datasets, in which examples are specified using arbitrary Boolean expressions. A number of preliminary case studies will be illustrated throughout the talk, including the unsupervised learning of preference rankings and the supervised learning of classifiers for routes and game traces.

Tuesday, 1/10/2017, 2:00 PM—3:00 PM: Flexible and Interpretable Regression Using Convex Penalties

Physics and Astronomy Building 1434A
Daniela Witten
Department of Statistics and Biostatistics
University of Washington

We consider the problem of fitting a regression model that is both flexible and interpretable. We propose two procedures for this task: the Fused Lasso Additive Model (FLAM), which is an additive model of piecewise constant fits; and Convex Regression with Interpretable Sharp Partitions (CRISP), which extends FLAM to allow for non-additivity. Both FLAM and CRISP are the solutions to convex optimization problems that can be efficiently solved. We show that FLAM and CRISP outperform competitors, such as sparse additive models (Ravikumar et al, 2009), CART (Breiman et al, 1984), and thin plate splines (Duchon, 1977), in a range of settings. We propose unbiased estimators for the degrees of freedom of FLAM and CRISP, which allow us to characterize their complexity. This is joint work with Ashley Petersen and Noah Simon at University of Washington.