×

High-dimensional heteroscedastic regression with an application to eQTL data analysis. (English) Zbl 1241.62152

Summary: We consider the problem of high-dimensional regression under nonconstant error variances. Despite being a common phenomenon in biological applications, heteroscedasticity has, so far, been largely ignored in high-dimensional analysis of genomic data sets. We propose a new methodology that allows non-constant error variances for high-dimensional estimation and model selection. Our method incorporates heteroscedasticity by simultaneously modeling both the mean and variance components via a novel doubly regularized approach. Extensive Monte Carlo simulations indicate that our proposed procedure can result in better estimation and variable selection than existing methods when heteroscedasticity arises from the presence of predictors explaining error variances and outliers. Further, we demonstrate the presence of heteroscedasticity in and apply our method to an expression quantitative trait loci (eQTLs) study of 112 yeast segregants. The new procedure can automatically account for heteroscedasticity in identifying the eQTLs that are associated with gene expression variations and lead to smaller prediction errors. These results demonstrate the importance of considering heteroscedasticity in eQTL data analysis.

MSC:

62P10 Applications of statistics to biology and medical sciences; meta analysis
62J05 Linear regression; mixed models
62N02 Estimation in survival analysis and censored data
65C05 Monte Carlo methods

Software:

glmnet; ggcleveland

References:

[1] Bøvelstad, Predicting survival from microarray data-a comparative study, Bioinformatics 23 pp 2080– (2007) · doi:10.1093/bioinformatics/btm305
[2] Box, Correcting inhomogeneity of variance with power transformation in weighting, Technometrics 16 pp 385– (1974) · Zbl 0283.62063 · doi:10.1080/00401706.1974.10489207
[3] Boyd, Convex Optimization (2004) · doi:10.1017/CBO9780511804441
[4] Brem, The landscape of genetic complexity across 5,700 gene expression traits in yeast, Proceedings of the National Academy of Sciences of the United States of America 102 pp 1572– (2005) · doi:10.1073/pnas.0408709102
[5] Breusch, Simple test for heteroscedasticity and random coefficient variation, Econometrica 47 pp 1287– (1979) · Zbl 0416.62021 · doi:10.2307/1911963
[6] Candes, The Dantzig selector: Statistical estimation when p is much larger than n, Annals of Statistics 35 pp 2313– (2007) · Zbl 1139.62019 · doi:10.1214/009053606000001523
[7] Carroll, Transformation and Weighting in Regression (1988) · doi:10.1007/978-1-4899-2873-3
[8] Carroll, Robust estimation in heteroscedastic linear models, Annals of Statistics 10 pp 429– (1982) · Zbl 0497.62034 · doi:10.1214/aos/1176345784
[9] Carroll, The effect of estimating weights in weighted least squares, Journal of the American Statistical Association 83 pp 1045– (1988) · Zbl 0691.62061 · doi:10.1080/01621459.1988.10478699
[10] Cleveland, Visualizing Data (1993)
[11] Cook, Diagnostics for heteroscedasticity in regression, Biometrika 70 pp 1– (1983) · Zbl 0502.62063 · doi:10.1093/biomet/70.1.1
[12] Efron, Least angle regression, Annals of Statistics 32 pp 407– (2004) · Zbl 1091.62054 · doi:10.1214/009053604000000067
[13] Fan, Variable selection via nonconcave penalized likelihood and its oracle properties, Journal of the American Statistical Association 96 pp 1348– (2001) · Zbl 1073.62547 · doi:10.1198/016214501753382273
[14] Friedman, Pathwise coordinate optimization, Annals of Applied Statistics 1 pp 302– (2007) · Zbl 1378.90064 · doi:10.1214/07-AOAS131
[15] Friedman, Regularization paths for generalized linear models via coordinate Descent, Journal of Statistical Software 33(1) pp 1– (2010)
[16] Golub, Generalized cross-validation as a method for choosing a good ridge parameter, Technometrics 21 pp 215– (1979) · Zbl 0461.62059 · doi:10.1080/00401706.1979.10489751
[17] Hastie, The Elements of Statistical Learning (2001) · Zbl 0973.62007 · doi:10.1007/978-0-387-21606-5
[18] Jia, Lasso under heteroscedasticity (2009)
[19] Koenker, Quantile Regression (2005) · doi:10.1017/CBO9780511754098
[20] Lange, Robust statistical modeling using the t distribution, Journal of the American Statistical Association 84 pp 881– (1989)
[21] Lee, Learning a prior on regulatory potential from eqtl data, PLoS Genet 5 (1) pp e1000358– (2009) · doi:10.1371/journal.pgen.1000358
[22] Li, l1-norm quantile regressions, Journal of Computational and Graphical Statistics 17 pp 163– (2008) · doi:10.1198/106186008X289155
[23] Li, A system for enhancing genome-wide co-expression dynamics study, Proceedings of the National Academy of Sciences USA 101 pp 15561– (2004) · doi:10.1073/pnas.0402962101
[24] Nocedal, Numerical Optimization (1999) · Zbl 0930.65067 · doi:10.1007/b98874
[25] She, Outlier detection using nonconvex penalized regression, Journal of the American Statistical Association 106 pp 626– (2011) · Zbl 1232.62068 · doi:10.1198/jasa.2011.tm10390
[26] Tibshirani, Regression shrinkage and selection via the lasso, Journal of the Royal Statistical Society, Series B 58 pp 267– (1996) · Zbl 0850.62538
[27] Tseng, Coordinate ascent for maximizing nondifferentiable concave functions (1988)
[28] Tseng, Convergence of block coordinate descent method for nondifferentiable maximation, Journal of Optimization Theory and Applications 109 pp 474– (2001) · Zbl 1006.65062 · doi:10.1023/A:1017501703105
[29] Wang, Proceedings of the Sixth International Conference on Data Mining (ICDM06) pp 690– (2006) · doi:10.1109/ICDM.2006.134
[30] Wang, Robust regression shrinkage and consistent variable selection via the lad-lasso, Journal of Business & Economic Statistics 25 pp 347– (2007) · doi:10.1198/073500106000000251
[31] Wu, Genome-wide association analysis by lasso penalized logistic regression, Bioinformatics 25 pp 714– (2008) · doi:10.1093/bioinformatics/btp041
[32] Wu, Variable selection in quantile regression, Statistica Sinica 19 pp 801– (2009) · Zbl 1166.62012
[33] Xu, Simultaneous estimation and variable selection in median regression using lasso-type penalty, Annals of the Institute of Statistical Mathematics 62 pp 487– (2010) · Zbl 1440.62280 · doi:10.1007/s10463-008-0184-2
[34] Zou, The adaptive lasso and its oracle properties, Journal of the American Statistical Association 101 pp 1418– (2006) · Zbl 1171.62326 · doi:10.1198/016214506000000735
[35] Zou, On the ”degrees of freedom” of the lasso, Annals of Statistics 35 pp 2173– (2007) · Zbl 1126.62061 · doi:10.1214/009053607000000127
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.