×

A general family of trimmed estimators for robust high-dimensional data analysis. (English) Zbl 1496.62121

Summary: We consider the problem of robustifying high-dimensional structured estimation. Robust techniques are key in real-world applications which often involve outliers and data corruption. We focus on trimmed versions of structurally regularized M-estimators in the high-dimensional setting, including the popular Least Trimmed Squares estimator, as well as analogous estimators for generalized linear models and graphical models, using convex and non-convex loss functions. We present a general analysis of their statistical convergence rates and consistency, and then take a closer look at the trimmed versions of the Lasso and Graphical Lasso estimators as special cases. On the optimization side, we show how to extend algorithms for M-estimators to fit trimmed variants and provide guarantees on their numerical convergence. The generality and competitive performance of high-dimensional trimmed estimators are illustrated numerically on both simulated and real-world genomics data.

MSC:

62J07 Ridge regression; shrinkage estimators (Lasso)
62F35 Robustness and adaptive procedures (parametric inference)
62H12 Estimation in multivariate analysis
62H22 Probabilistic graphical models
62P10 Applications of statistics to biology and medical sciences; meta analysis

Software:

glasso; spatial; KEGG

References:

[1] Alfons, A., Croux, C., and Gelper, S. (2013), “Sparse least trimmed squares regression for analyzing high-dimensional large data sets,”, Ann. Appl. Stat., 7, 226–248. · Zbl 1454.62123 · doi:10.1214/12-AOAS575
[2] Aravkin, A. Y. and Van Leeuwen, T. (2012), “Estimating nuisance parameters in inverse problems,”, Inverse Problems, 28, 115016. · Zbl 1253.49021 · doi:10.1088/0266-5611/28/11/115016
[3] Bannerjee, O., Ghaoui, L. E., and d’Aspremont, A. (2008), “Model selection through sparse maximum likelihood estimation for multivariate Gaussian or binary data,”, Jour. Mach. Lear. Res., 9, 485–516. · Zbl 1225.68149
[4] Belloni, A., Chernozhukov, V., Kaul, A., Rosenbaum, M., and Tsybakov, A. B. (2017), “Pivotal Estimation via Self-Normalization for High-Dimensional Linear Models with Error in Variables,”, arXiv preprint arXiv:1708.08353.
[5] Belloni, A., Chernozhukov, V., and Wang, L. (2011), “Square-root lasso: pivotal recovery of sparse signals via conic programming,”, Biometrika, 98, 791–806. · Zbl 1228.62083 · doi:10.1093/biomet/asr043
[6] Bhatia, K., Jain, P., and Kar, P. (2015), “Robust Regression via Hard Thresholding,” in, Neur. Info. Proc. Sys. (NIPS).
[7] Boyd, S. and Vandenberghe, L. (2004), Convex optimization, Cambridge, UK: Cambridge University Press. · Zbl 1058.90049
[8] Brem, R. B. and Kruglyak, L. (2005), “The landscape of genetic complexity across 5,700 gene expression traits in yeast,”, Proceedings of the National Academy of Sciences of the United States of America, 102, 1572–1577.
[9] Brem, R. B., Storey, J. D., Whittle, J., and Kruglyak, L. (2005), “Genetic interactions between polymorphisms that affect gene expression in yeast.”, Nature, 436, 701–703.
[10] Bunea, F. (2008), “Honest variable selection in linear and logistic regression models via l1 and l1 + l2 penalization,”, Electron. J. Stat., 2, 1153–1194. · Zbl 1320.62170 · doi:10.1214/08-EJS287
[11] Candès, E., Romberg, J., and Tao, T. (2006), “Stable signal recovery from incomplete and inaccurate measurements,”, Communications on Pure and Applied Mathematics, 59, 1207–1223. · Zbl 1098.94009 · doi:10.1002/cpa.20124
[12] Chen, Y., Caramanis, C., and Mannor, S. (2013), “Robust High Dimensional Sparse Regression and Matching Pursuit,”, The Proceedings of the International Conference on Machine Learning (ICML).
[13] Chetverikov, D., Liao, Z., and Chernozhukov, V. (2017), “On cross-validated Lasso,”, Arxiv preprint arXiv:1605.02214.
[14] Cross, G. and Jain, A. (1983), “Markov Random Field Texture Models,”, IEEE Trans. PAMI, 5, 25–39.
[15] Daye, Z., Chen, J., and H., L. (2012), “High-Dimensional Heteroscedastic Regression with an Application to eQTL Data Analysis,”, Biometrics, 68, 316–326. · Zbl 1241.62152 · doi:10.1111/j.1541-0420.2011.01652.x
[16] Finegold, M. and Drton, M. (2011), “Robust graphical modeling of gene networks using classical and alternative T-distributions,”, The Annals of Applied Statistics, 5, 1057–1080. · Zbl 1232.62083 · doi:10.1214/10-AOAS410
[17] Friedman, J., Hastie, T., and Tibshirani, R. (2007), “Sparse inverse covariance estimation with the graphical Lasso,”, Biostatistics. · Zbl 1143.62076 · doi:10.1093/biostatistics/kxm045
[18] Golub, G. and Pereyra, V. (2003), “Separable nonlinear least squares: the variable projection method and its applications,”, Inverse Problems, 19, R1–R26. · Zbl 1022.65014 · doi:10.1088/0266-5611/19/2/201
[19] Hassner, M. and Sklansky, J. (1978), “Markov Random Field Models of Digitized Image Texture,” in, ICPR78, pp. 538–540.
[20] Ising, E. (1925), “Beitrag zur Theorie der Ferromagnetismus,”, Zeitschrift für Physik, 31, 253–258. · Zbl 1439.82056
[21] Kanehisa, M., Goto, S., Sato, Y., Kawashima, M., Furumichi, M., and Tanabe, M. (2014), “Data, information, knowledge and principle: back to metabolism in KEGG,”, Nucleic Acids Res., 42, D199–D205.
[22] Lambert-Lacroix, S., Zwald, L., et al. (2011), “Robust regression through the Huber’s criterion and adaptive lasso penalty,”, Electronic Journal of Statistics, 5, 1015–1053. · Zbl 1274.62467 · doi:10.1214/11-EJS635
[23] Lauritzen, S. (1996), Graphical models, Oxford University Press, USA. · Zbl 0907.62001
[24] Liu, L., Shen, Y., Li, T., and Caramanis, C. (2018), “High dimensional robust sparse regression,”, Arxiv preprint arXiv:1805.11643.
[25] Loh, P. and Wainwright, M. J. (2015), “Regularized M-estimators with Nonconvexity: Statistical and Algorithmic Theory for Local Optima,”, Journal of Machine Learning Research (JMLR), 16, 559–616. · Zbl 1360.62276
[26] Loh, P.-L. and Wainwright, M. J. (2013), “Regularized M-estimators with nonconvexity: Statistical and algorithmic theory for local optima,” in, Neur. Info. Proc. Sys. (NIPS), 26. · Zbl 1360.62276
[27] Manning, C. D. and Schutze, H. (1999), Foundations of Statistical Natural Language Processing, MIT Press. · Zbl 0951.68158
[28] Meinshausen, N. and Bühlmann, P. (2006), “High-dimensional graphs and variable selection with the Lasso,”, Annals of Statistics, 34, 1436–1462. · Zbl 1113.62082 · doi:10.1214/009053606000000281
[29] Negahban, S., Ravikumar, P., Wainwright, M. J., and Yu, B. (2012), “A unified framework for high-dimensional analysis of M-estimators with decomposable regularizers,”, Statistical Science, 27, 538–557. · Zbl 1331.62350 · doi:10.1214/12-STS400
[30] Nesterov, Y. (2004), Introductory lectures on convex optimization, vol. 87 of Applied Optimization, Kluwer Academic Publishers, Boston, MA, a basic course. · Zbl 1086.90045
[31] Nguyen, N. H. and Tran, T. D. (2013), “Robust Lasso with missing and grossly corrupted observations,”, IEEE Trans. Info. Theory, 59, 2036–2058. · Zbl 1364.94146 · doi:10.1109/TIT.2012.2232347
[32] Oh, J. H. and Deasy, J. O. (2014), “Inference of radio-responsive gene regulatory networks using the graphical lasso algorithm,”, BMC Bioinformatics, 15, S5.
[33] Prasad, A., Suggala, A. S., Balakrishnan, S., and Ravikumar, P. (2018), “Robust Estimation via Robust Gradient Estimation,”, Arxiv preprint arXiv:1802.06485.
[34] Raskutti, G., Wainwright, M. J., and Yu, B. (2010), “Restricted Eigenvalue Properties for Correlated Gaussian Designs,”, Journal of Machine Learning Research (JMLR), 99, 2241–2259. · Zbl 1242.62071
[35] Ravikumar, P., Wainwright, M. J., Raskutti, G., and Yu, B. (2011), “High-dimensional covariance estimation by minimizing \(ℓ_1\)-penalized log-determinant divergence,”, Electronic Journal of Statistics, 5, 935–980. · Zbl 1274.62190 · doi:10.1214/11-EJS631
[36] Recht, B., Fazel, M., and Parrilo, P. A. (2010), “Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization,”, SIAM review, 52, 471–501. · Zbl 1198.90321 · doi:10.1137/070697835
[37] Ripley, B. D. (1981), Spatial statistics, New York: Wiley. · Zbl 0583.62087
[38] Rosenbaum, M. and Tsybakov, A. B. (2010), “Sparse recovery under matrix uncertainty,”, The Annals of Statistics, 2620–2651. · Zbl 1373.62357 · doi:10.1214/10-AOS793
[39] Rousseeuw, P. J. (1984), “Least median of squares regression,”, J. Amer. Statist. Assoc., 79, 871–880. · Zbl 0547.62046 · doi:10.1080/01621459.1984.10477105
[40] Stratton, H., Zhou, J., Reed, S., and Stone, D. (1996), “The Mating-Specific Galpha Protein of Saccharomyces cerevisiae Downregulates the Mating Signal by a Mechanism That Is Dependent on Pheromone and Independent of Gbetagamma Sequestration,”, Molecular and Cellular Biology.
[41] Sun, H. and Li, H. (2012), “Robust Gaussian graphical modeling via l1 penalization,”, Biometrics, 68, 1197–206. · Zbl 1259.62102 · doi:10.1111/j.1541-0420.2012.01785.x
[42] Tibshirani, J. and Manning, C. D. (2014), “Robust Logistic Regression using Shift Parameters.” in, ACL (2), pp. 124–129.
[43] Tibshirani, R. (1996), “Regression shrinkage and selection via the lasso,”, Journal of the Royal Statistical Society, Series B, 58, 267–288. · Zbl 0850.62538 · doi:10.1111/j.2517-6161.1996.tb02080.x
[44] van de Geer, S. and Buhlmann, P. (2009), “On the conditions used to prove oracle results for the Lasso,”, Electronic Journal of Statistics, 3, 1360–1392. · Zbl 1327.62425 · doi:10.1214/09-EJS506
[45] Vershynin, R. (2012), “Introduction to the non-asymptotic analysis of random matrices,” in, Compressed Sensing: Theory and Applications, eds. Eldar, Y. and Kutyniok, G., Cambridge University Press, pp. 210–268, forthcoming.
[46] Wainwright, M. J. (2009), “Sharp thresholds for high-dimensional and noisy sparsity recovery using \(ℓ_1\)-constrained quadratic programming (Lasso),”, IEEE Trans. Information Theory, 55, 2183–2202. · Zbl 1367.62220 · doi:10.1109/TIT.2009.2016018
[47] Wang, H., Li, G., and Jiang, G. (2007), “Robust regression shrinkage and consistent variable selection through the LAD-lasso,”, Journal of Business and Economics Statistics, 25, 347–355.
[48] Woods, J. (1978), “Markov Image Modeling,”, IEEE Transactions on Automatic Control, 23, 846–850.
[49] Yang, E. and Ravikumar, P. (2013), “Dirty Statistical Models,” in, Neur. Info. Proc. Sys. (NIPS), 26.
[50] Yang, E., Ravikumar, P., Allen, G. I., and Liu, Z. (2012), “Graphical Models via Generalized Linear Models,” in, Neur. Info. Proc. Sys. (NIPS), 25.
[51] Yang, E., Tewari, A., and Ravikumar, P. (2013), “On Robust Estimation of High Dimensional Generalized Linear Models,” in, Inter. Joint Conf. on Artificial Intelligence, 13.
[52] Yuan, M. and Lin, Y. (2007), “Model selection and estimation in the Gaussian graphical model,”, Biometrika, 94, 19–35. · Zbl 1142.62408 · doi:10.1093/biomet/asm018
[53] Zhang, X., Xu, C., Zhang, Y., Zhu, T., and Cheng, L. (2017a), “Multivariate Regression with Grossly Corrupted Observations: A Robust Approach and its Applications,”, Arxiv preprint arXiv:1701.02892.
[54] Zhang, X., Zhao, L., Boedihardjo, A. P., and Lu, C.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.