×

Variable selection in finite mixture of median regression models using skew-normal distribution. (English) Zbl 07660551

Summary: A regression model with skew-normal errors provides a useful extension for traditional normal regression models when the data involve asymmetric outcomes. Moreover, data that arise from a heterogeneous population can be efficiently analysed by a finite mixture of regression models. These observations motivate us to propose a novel finite mixture of median regression model based on a mixture of the skew-normal distributions to explore asymmetrical data from several subpopulations. With the appropriate choice of the tuning parameters, we establish the theoretical properties of the proposed procedure, including consistency for variable selection method and the oracle property in estimation. A productive nonparametric clustering method is applied to select the number of components, and an efficient EM algorithm for numerical computations is developed. Simulation studies and a real data set are used to illustrate the performance of the proposed methodologies.

MSC:

62-XX Statistics

Software:

sn

References:

[1] Akaike, H., Information theory and an extension of the maximum likelihood principle, International Symposium on Information Theory, 1, 610-624 (1973) · Zbl 0283.62006 · doi:10.1007/978-1-4612-1694-0_15
[2] Atienza, N.; Garcia-Heras, J.; Muñoz-Pichardo, J., A new condition for identifiability of finite mixture distributions, Metrika, 63, 2, 215-221 (2006) · Zbl 1095.62016 · doi:10.1007/s00184-005-0013-z
[3] Azzalini, A., A class of distributions which includes the normal ones, Scandinavian Journal of Statistics, 12, 2, 171-178 (1985) · Zbl 0581.62014
[4] Azzalini, A.; Capitanio, A., The skew-normal and related families (2013), Cambridge University Press · Zbl 0924.62050
[5] Chen, J., Consistency of the MLE under mixture models, Statistical Science, 32, 1, 47-63 (2017) · Zbl 1442.62064 · doi:10.1214/16-sts578
[6] Chen, J.; Li, P.; Liu, G., Homogeneity testing under finite location-scale mixtures, Canadian Journal of Statistics, 48, 4, 670-684 (2020) · Zbl 1492.62046 · doi:10.1002/cjs.11557
[7] Chen, J.; Tan, X., Inference for multivariate normal mixtures, Journal of Multivariate Analysis, 100, 7, 1367-1383 (2009) · Zbl 1162.62052 · doi:10.1016/j.jmva.2008.12.005
[8] Cook, R.-D.; Weisberg, S., An introduction to regression graphics (1994), John Wiley and Sons · Zbl 0925.62287
[9] Fan, J.; Li, R., Variable selection via nonconcave penalized likelihood and its oracle properties, Journal of the American Statistical Association, 96, 456, 1348-1360 (2001) · Zbl 1073.62547 · doi:10.1198/016214501753382273
[10] Goldfeld, S.; Quandt, R., A Markov model for switching regressions, Journal of Econometrics, 1, 1, 3-15 (1973) · Zbl 0294.62087 · doi:10.1016/0304-4076(73)90002-X
[11] He, M.; Chen, J., Consistency of the MLE under a two-parameter gamma mixture model with a structural shape parameter, Metrika (2022) · Zbl 07596193 · doi:10.1007/s00184-021-00856-9
[12] He, M.; Chen, J., Strong consistency of the MLE under two-parameter gamma mixture models with a structural scale parameter, Advances in Data Analysis and Classification, 16, 1, 125-154 (2022) · Zbl 07538946 · doi:10.1007/s11634-021-00472-5
[13] Hu, D.; Gu, Y.; Zhao, W., Bayesian variable selection for median regression, Chinese Journal of Applied Probability and Statistics, 35, 6, 594-610 (2019) · Zbl 1449.62062
[14] Karlis, D.; Xekalaki, E., Choosing initial values for the EM algorithm for finite mixtures, Computational Statistics & Data Analysis, 41, 3-4, 577-590 (2003) · Zbl 1429.62082 · doi:10.1016/S0167-9473(02)00177-9
[15] Khalili, A.; Chen, J., Variable selection in finite mixture of regression models, Journal of the American Statistical Association, 102, 479, 1025-1038 (2007) · Zbl 1469.62306 · doi:10.1198/016214507000000590
[16] Kottas, A.; Gelfand, A., Bayesian semiparametric median regression modeling, Journal of the American Statistical Association, 96, 456, 1458-1468 (2001) · Zbl 1051.62038 · doi:10.1198/016214501753382363
[17] Li, H.; Wu, L.; Ma, T., Variable selection in joint location, scale and skewness models of the skew-normal distribution, Journal of Systems Science and Complexity, 30, 3, 694-709 (2017) · Zbl 1369.62166 · doi:10.1007/S11424-016-5193-2
[18] Li, H.; Wu, L.; Yi, J., A skew-normal mixture of joint location, scale and skewness models, Applied Mathematics-A Journal of Chinese Universities, 31, 3, 283-295 (2016) · Zbl 1374.62081 · doi:10.1007/S11766-016-3367-2
[19] Li, J.; Ray, S.; Lindsay, B.-G., A nonparametric statistical approach to clustering via mode identification, Journal of Machine Learning Research, 8, 8, 1687-1723 (2007) · Zbl 1222.62076
[20] Lin, T.-I.; Lee, J.; Yen, S., Finite mixture modelling using the skew normal distribution, Statistica Sinica, 17, 3, 909-927 (2007) · Zbl 1133.62012
[21] Liu, M.; Lin, T.-I., A skew-normal mixture regression model, Educational and Psychological Measurement, 74, 1, 139-162 (2014) · doi:10.1177/0013164413498603
[22] McLachlan, G.; Peel, D., Finite mixture models (2004), John Wiley and Sons
[23] Otiniano, C. E. G.; Rathie, P. N.; Ozelim, L. C. S. M., On the identifiability of finite mixture of skew-normal and skew-t distributions, Statistics & Probability Letters, 106, 103-108 (2015) · Zbl 1398.62036 · doi:10.1016/j.spl.2015.07.015
[24] Richardson, S.; Green, P., On bayesian analysis of mixtures with an unknown number of components (with discussion), Journal of the Royal Statistical Society: Series B (Statistical Methodology), 59, 4, 731-792 (1997) · Zbl 0891.62020 · doi:10.1111/1467-9868.00095
[25] Schwarz, G., Estimating the dimension of a model, The Annals of Statistics, 6, 2, 461-464 (1978) · Zbl 0379.62005 · doi:10.1214/AOS/1176344136
[26] Tang, A.; Tang, N., Semiparametric Bayesian inference on skew-normal joint modeling of multivariate longitudinal and survival data, Statistics in Medicine, 34, 5, 824-843 (2015) · doi:10.1002/SIM.6373
[27] Tibshirani, R., Regression shrinkage and selection via the lasso, Journal of the Royal Statistical Society. Series B. Statistical Methodology, 58, 1, 267-288 (1996) · Zbl 0850.62538 · doi:10.1111/J.2517-6161.1996.TB02080.X
[28] Titterington, D.; Smith, A.; Makov, U., Statistical analysis of finite mixture distributions (1985), John Wiley and Sons · Zbl 0646.62013
[29] Wang, H.; Li, R.; Tsai, C., Tuning parameter selectors for the smoothly clipped absolute deviation method, Biometrika, 94, 3, 553-568 (2007) · Zbl 1135.62058 · doi:10.1093/BIOMET/ASM053
[30] Wang, P.; Puterman, M.; Cockburn, I.; Le, N., Mixed Poisson regression models with covariate dependent rates, Biometrics, 52, 2, 381-400 (1996) · Zbl 0875.62407 · doi:10.2307/2532881
[31] Wu, L., Variable selection in joint location and scale models of the skew-t-normal distribution, Communications in Statistics. Simulation and Computation, 43, 3, 615-630 (2014) · Zbl 1291.62062 · doi:10.1080/03610918.2012.712182
[32] Wu, L.; Li, S.; Tao, Y., Estimation and variable selection for mixture of joint mean and variance models, Communications in Statistics-Theory and Methods, 50, 24, 6081-6098 (2020) · Zbl 07532243 · doi:10.1080/03610926.2020.1738493
[33] Wu, L.; Zhang, Z.; Xu, D., Variable selection in joint location and scale models of the skew-normal distribution, Journal of Statistical Computation and Simulation, 83, 7, 1266-1278 (2013) · Zbl 1431.62293 · doi:10.1080/00949655.2012.657198
[34] Yao, W.; Li, L., A new regression model: Modal linear regression, Scandinavian Journal of Statistics, 41, 3, 656-671 (2014) · Zbl 1309.62119 · doi:10.1111/SJOS.12054
[35] Yin, J.; Wu, L.; Dai, L., Variable selection in finite mixture of regression models using the skew-normal distribution, Journal of Applied Statistics, 47, 16, 2941-2960 (2020) · Zbl 1521.62541 · doi:10.1080/02664763.2019.1709051
[36] Yin, J.; Wu, L.; Lu, H.; Dai, L., New estimation in mixture of experts models using the Pearson type VII distribution, Communications in Statistics. Simulation and Computation, 49, 2, 472-483 (2020) · Zbl 07552586 · doi:10.1080/03610918.2018.1485943
[37] Zhou, X.; Liu, G., LAD-Lasso variable selection for doubly censored median regression models, Communications in Statistics. Theory and Methods, 45, 12, 3658-3667 (2016) · Zbl 1342.62159 · doi:10.1080/03610926.2014.904357
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.