High-Dimensional Heteroscedastic Regression with an Application to eQTL Data Analysis
Article first published online: 12 AUG 2011
© 2011, The International Biometric Society
Volume 68, Issue 1, pages 316–326, March 2012
How to Cite
Daye, Z. J., Chen, J. and Li, H. (2012), High-Dimensional Heteroscedastic Regression with an Application to eQTL Data Analysis. Biometrics, 68: 316–326. doi: 10.1111/j.1541-0420.2011.01652.x
- Issue published online: 23 MAR 2012
- Article first published online: 12 AUG 2011
- Received November 2010. Revised May 2011. Accepted June 2011.
- Generalized least squares;
- Large p small n;
- Model selection;
- Sparse regression;
- Variance estimation
Summary We consider the problem of high-dimensional regression under nonconstant error variances. Despite being a common phenomenon in biological applications, heteroscedasticity has, so far, been largely ignored in high-dimensional analysis of genomic data sets. We propose a new methodology that allows nonconstant error variances for high-dimensional estimation and model selection. Our method incorporates heteroscedasticity by simultaneously modeling both the mean and variance components via a novel doubly regularized approach. Extensive Monte Carlo simulations indicate that our proposed procedure can result in better estimation and variable selection than existing methods when heteroscedasticity arises from the presence of predictors explaining error variances and outliers. Further, we demonstrate the presence of heteroscedasticity in and apply our method to an expression quantitative trait loci (eQTLs) study of 112 yeast segregants. The new procedure can automatically account for heteroscedasticity in identifying the eQTLs that are associated with gene expression variations and lead to smaller prediction errors. These results demonstrate the importance of considering heteroscedasticity in eQTL data analysis.