×

High-dimensional robust approximated \(M\)-estimators for mean regression with asymmetric data. (English) Zbl 1520.62095

Summary: Asymmetry along with heteroscedasticity or contamination often occurs with the growth of data dimensionality. In ultra-high dimensional data analysis, such irregular settings are usually overlooked for both theoretical and computational convenience. In this paper, we establish a framework for estimation in high-dimensional regression models using Penalized Robust Approximated quadratic M-estimators (PRAM). This framework allows general settings such as random errors lack symmetry and homogeneity, or covariates are not sub-Gaussian. To reduce the possible bias caused by data’s irregularity in mean regression, PRAM adopts a loss function with an adaptive robustification parameter. Theoretically, we first show that, in the ultra-high dimension setting, PRAM estimators have local estimation consistency at the minimax rate enjoyed by the LS-Lasso. Then we show that PRAM with an appropriate non-convex penalty in fact agrees with the local oracle solution, and thus obtain its oracle property. Computationally, we compare the performances of six PRAM estimators (Huber, Tukey’s biweight or Cauchy loss function combined with Lasso or MCP penalty function). Our simulation studies and real data analysis demonstrate satisfactory finite sample performances of the PRAM estimator under different irregular settings.

MSC:

62J07 Ridge regression; shrinkage estimators (Lasso)
62H12 Estimation in multivariate analysis
62F12 Asymptotic properties of parametric estimators
62F35 Robustness and adaptive procedures (parametric inference)
62J05 Linear regression; mixed models

References:

[1] An, B. C.; Choi, Y.-D.; Oh, I.-J.; Kim, J. H.; Park, J.-I.; Lee, S.-w., GPx3-mediated redox signaling arrests the cell cycle and acts as a tumor suppressor in lung cancer cell lines, PLoS One, 13, 9, Article e0204170 pp. (2018)
[2] Bosserhoff, A. K.; Schneider, N.; Ellmann, L.; Heinzerling, L.; Kuphal, S., The neurotrophin Neuritin1 (cpg15) is involved in melanoma migration, attachment independent growth, and vascular mimicry, Oncotarget, 8, 1, 1117 (2017)
[3] Chen, H.; Zheng, Z.; Kim, K.-Y.; Jin, X.; Roh, M. R.; Jin, Z., Hypermethylation and downregulation of glutathione peroxidase 3 are related to pathogenesis of melanoma, Oncol. Rep., 36, 5, 2737-2744 (2016)
[4] Daye, Z. J.; Chen, J.; Li, H., High-dimensional heteroscedastic regression with an application to eQTL data analysis, Biometrics, 68, 1, 316-326 (2012) · Zbl 1241.62152
[5] Dew-Becker, I.; Tahbaz-Salehi, A.; Vedolin, A., Skewness and Time-Varying Second Moments in a Nonlinear Production Network: Theory and EvidenceTechnical Report (2021), National Bureau of Economic Research
[6] Fan, J.; Li, R., Variable selection via nonconcave penalized likelihood and its oracle properties, J. Amer. Statist. Assoc., 96, 456, 1348-1360 (2001) · Zbl 1073.62547
[7] Fan, J.; Li, Q.; Wang, Y., Estimation of high dimensional mean regression in the absence of symmetry and light tail assumptions, J. R. Stat. Soc. Ser. B Stat. Methodol., 79, 1, 247-265 (2017) · Zbl 1414.62178
[8] Fan, J.; Wang, W.; Zhu, Z., A shrinkage principle for heavy-tailed data: High-dimensional robust low-rank matrix recovery, Ann. Statist., 49, 3, 1239 (2021) · Zbl 1479.62034
[9] Gao, X.; Huang, J., Asymptotic analysis of high-dimensional lad regression with lasso, Statist. Sinica, 1485-1506 (2010) · Zbl 1200.62073
[10] Hampel, F. R.; Ronchetti, E. M.; Rousseeuw, P.; Stahel, W. A., Robust Statistics: the Approach Based on Influence Functions (1986), Wiley-Interscience; New York · Zbl 0593.62027
[11] He, X.; Shao, Q.-M., On parameters of increasing dimensions, J. Multivariate Anal., 73, 1, 120-135 (2000) · Zbl 0948.62013
[12] Hess, G. D.; Iwata, S., Asymmetric persistence in GDP? A deeper look at depth, J. Monetary Econ., 40, 3, 535-554 (1997)
[13] Hill, R. W., Robust regression when there are outliers in the carriers (1977), Harvard University, (Ph.D. thesis)
[14] Huber, P. J., Robust estimation of a location parameter, Ann. Math. Stat., 35, 1, 73-101 (1964) · Zbl 0136.39805
[15] Izquierdo-Torres, E.; Rodríguez, G.; Meneses-Morales, I.; Zarain-Herzberg, A., ATP2A3 gene as an important player for resveratrol anticancer activity in breast cancer cells, Mol. Carcinog., 56, 7, 1703-1711 (2017)
[16] Lambert-Lacroix, S.; Zwald, L., Robust regression through the Huber’s criterion and adaptive lasso penalty, Electron. J. Stat., 5, 1015-1053 (2011) · Zbl 1274.62467
[17] Li, Z.-L.; Zhou, S.-F., A SILAC-based approach elicits the proteomic responses to vancomycin-associated nephrotoxicity in human proximal tubule epithelial HK-2 cells, Molecules, 21, 2, 148 (2016)
[18] Loh, Z.-L., Statistical consistency and asymptotic normality for high-dimensional robust \(M\)-estimators, Ann. Statist., 45, 2, 866-896 (2017) · Zbl 1371.62023
[19] Loh, P.-L., Scale calibration for high-dimensional robust regression, Electron. J. Stat., 15, 2, 5933-5994 (2021) · Zbl 1493.62104
[20] Loh, P.-L.; Wainwright, M. J., Regularized M-estimators with nonconvexity: Statistical and algorithmic theory for local optima, (Advances in Neural Information Processing Systems (2013)), 476-484
[21] Loh, P.-L.; Wainwright, M. J., Support recovery without incoherence: A case for nonconvex regularization, Ann. Statist., 45, 6, 2455-2482 (2017) · Zbl 1385.62008
[22] Lütkepohl, H.; Xu, F., The role of the log transformation in forecasting economic variables, Empir. Econ., 42, 3, 619-638 (2012)
[23] Mallows, C. L., On some topics in robustness (1975), Unpublished Memorandum, Bell Telephone Laboratories, Murray Hill, NJ
[24] Maric, G.; Rose, A. A.; Annis, M. G.; Siegel, P. M., Glycoprotein non-metastatic b (GPNMB): A metastatic mediator and emerging therapeutic target in cancer, OncoTargets and Therapy, 6, 839 (2013)
[25] Massart, P., Concentration Inequalities and Model Selection (2007), Springer · Zbl 1170.60006
[26] Merrill, H. M.; Schweppe, F. C., Bad data suppression in power system static state estimation, IEEE Trans. Power Appar. Syst., 6, 2718-2725 (1971)
[27] Müller, C., Redescending M-estimators in regression analysis, cluster analysis and image analysis, Discuss. Math. Probab. Stat., 24, 1, 59-75 (2004) · Zbl 1053.62081
[28] Nesterov, Y., Gradient methods for minimizing composite functions, Math. Program., 140, 1, 125-161 (2013) · Zbl 1287.90067
[29] Oshima, R. G.; Baribault, H.; Caulín, C., Oncogenic regulation and function of keratins 8 and 18, Cancer Metastasis Rev., 15, 4, 445-471 (1996)
[30] Pan, X.; Sun, Q.; Zhou, W.-X., Iteratively reweighted 1-penalized robust regression, Electron. J. Stat., 15, 1, 3287-3348 (2021) · Zbl 1472.62116
[31] Rivasplata, O., Subgaussian random variables: An expository note (2012), Internet Publication, PDF
[32] Rousseeuw, P. J.; Leroy, A. M., Robust Regression and Outlier Detection, vol. 589 (2005), John Wiley & Sons, New York
[33] Shankavaram, U. T.; Reinhold, W. C.; Nishizuka, S.; Major, S.; Morita, D.; Chary, K. K.; Reimers, M. A.; Scherf, U.; Kahn, A.; Dolginow, D., Transcript and protein expression profiles of the NCI-60 cancer cell panel: an integromic microarray study, Mol. Cancer Ther., 6, 3, 820-832 (2007)
[34] Shevlyakov, G.; Morgenthaler, S.; Shurygin, A., Redescending M-estimators, J. Statist. Plann. Inference, 138, 10, 2906-2917 (2008) · Zbl 1213.62036
[35] Sun, Q.; Zhou, W.-X.; Fan, J., Adaptive huber regression, J. Amer. Statist. Assoc., 1-24 (2019)
[36] Tibshirani, R., Regression shrinkage and selection via the lasso, J. R. Stat. Soc. Ser. B Stat. Methodol., 267-288 (1996) · Zbl 0850.62538
[37] Walker, L. C.; Harris, G. C.; Holloway, A. J.; McKenzie, G. W.; Wells, J. E.; Robinson, B. A.; Morris, C. M., Cytokeratin KRT8/18 expression differentiates distinct subtypes of grade 3 invasive ductal carcinoma of the breast, Cancer Gen. Cytogen., 178, 2, 94-103 (2007)
[38] Wang, L., The L1 penalized LAD estimator for high dimensional linear regression, J. Multivariate Anal., 120, 135-151 (2013) · Zbl 1279.62144
[39] Wang, L.; Wu, Y.; Li, R., Quantile regression for analyzing heterogeneity in ultra-high dimension, J. Amer. Statist. Assoc., 107, 497, 214-222 (2012) · Zbl 1328.62468
[40] Zhang, C.-H., Nearly unbiased variable selection under minimax concave penalty, Ann. Statist., 38, 2, 894-942 (2010) · Zbl 1183.62120
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.