Physics and Astronomy Building 1434A
Jun Liu, Professor
Department of Statistics, Harvard University
Simultaneously finding multiple influential variables and controlling the false discovery rate (FDR) for linear regression models is a fundamental problem with a long history. Researchers recently have proposed and examined a few innovative approaches surrounding the idea of creating “knockoff” variables (like spike-ins in biological experiments) to control FDR. As opposed to creating knockoffs, a classical statistical idea is to introduce perturbations and examine the impacts. We introduce here a perturbation-based Gaussian Mirror (GM) method, which creates for each predictor variable a pair of perturbed “mirror variables” by adding and subtracting a randomly generated Gaussian random variable, and proceeds with a certain regression method, such as the ordinary least-square or the Lasso. The mirror variables naturally lead to a test statistic highly effective for controlling the FDR. The proposed GM method does not require strong conditions for the covariates, nor any knowledge of the noise level and relative magnitudes of dimension p and sample size n. We observe that the GM method is more powerful than many existing methods in selecting important variables, subject to the control of FDR especially under the case when high correlations among the covariates exist. Additionally, we provide a method to reliably estimate a confidence interval and upper bound for the number of false discoveries. If time permits, I will also discuss a simpler bootstrap-type perturbation method for estimating FDRs, which is also more powerful than knockoff methods when the predictors are reasonably correlated. The presentation is based on joint work with Xing Xin and Chenguang Dai.
To find out more about our speaker, please visit: http://sites.fas.harvard.edu/~junliu/