Xiaotong Shen, Professor
School of Statistics, University of Minnesota
Location : Geology Building, Room 4660
Data perturbation is a technique for generating synthetic data by adding “noise” to original data, which has a wide range of applications, primarily in data security. Yet, it has not received much attention within data science. In this presentation,I will describe a fundamental principle of data perturbation that preserves the distributional information, thus ascertaining the validity of the downstream analysis and a machine learning task while protecting data privacy. Applying this principle, we derive a scheme to allow a user to perturb data nonlinearly while meeting the requirements of differential privacy and statistical analysis. It yields credible statistical analysis and high predictive accuracy of a machine learning task. Finally, I will highlight multiple facets of data perturbation through examples. This work is joint with B Xuan and R Shen.
Dr. Xiaotong T. Shen is the John Black Johnston Distinguished Professor in the College of Liberal Arts at the University of Minnesota. His areas of interest include machine learning and data science, high-dimensional inference, nonparametric and semi-parametric inference, causal graphical models, personalization, recommender systems, natural language processing and text mining, and nonconvex minimization. His current research effort is devoted to the further development of causal and constrained inference, structured learning, inference for black-box learners, and scalable analysis. The target application areas are biomedical sciences, artificial intelligence, and engineering.
Dr. Shen is a Fellow of the American Association for the Advancement of Science (AAAS), a Fellow of the American Statistical Association (ASA), a Fellow of the Institute of Mathematical Statistics (IMS), and an Elected Member of the International Statistical Institute (ISI). He won the Best Paper Award (with Pan and Xie) of the International Biometric Society in 2012. He is recognized in the list of “20 Data Science Professors to Know” by OnlineEngineeringPrograms.com .