×

Robust mixture regression modeling based on the normal mean-variance mixture distributions. (English) Zbl 07708615

Summary: Mixture regression models (MRMs) are widely used to capture the heterogeneity of relationships between the response variable and one or more predictors coming from several non-homogeneous groups. Since the conventional MRMs are quite sensitive to departures from normality caused by extra skewness and possible heavy tails, various extensions built on more flexible distributions have been put forward in the last decade. The class of normal mean-variance mixture (NMVM) distributions that arise from scaling both the mean and variance of a normal random variable with a common mixing distribution encompasses many prominent (symmetric or asymmetrical) distributions as special cases. A unified approach to robustifying MRMs is proposed by considering the class of NMVM distributions for component errors. An expectation conditional maximization either (ECME) algorithm, which incorporates membership indicators and the latent scaling variables as the missing data, is developed for carrying out maximum likelihood (ML) estimation of model parameters. Four simulation studies are conducted to examine the finite-sample property of ML estimators and the robustness of the proposed model against outliers for contaminated and noisy data. The usefulness and superiority of our methodology are demonstrated through applications to two real datasets.

MSC:

62-08 Computational methods for problems pertaining to statistics

Software:

QRM; AS 136; Algorithm 39
Full Text: DOI

References:

[1] Aitken, A. C., On Bernoulli’s numerical solution of algebraic equations, Proc. R. Soc. Edinb., 46, 289-305 (1927) · JFM 52.0098.05
[2] Aitkin, M.; Wilson, G. T., Mixture models, outliers, and the EM algorithm, Technometrics, 22, 325-331 (1980) · Zbl 0466.62034
[3] Akaike, H., Information theory and an extension of the maximum likelihood principle, (Selected Papers of Hirotugu Akaike (1998), Springer), 199-213
[4] Arslan, O., Variance-mean mixture of the multivariate skew normal distribution, Stat. Pap., 56, 353-378 (2015) · Zbl 1309.62043
[5] Azzalini, A.; Capitanio, A., Distributions generated by perturbation of symmetry with emphasis on a multivariate skew t-distribution, J. R. Stat. Soc. B, 65, 367-389 (2003) · Zbl 1065.62094
[6] Bai, X.; Chen, K.; Yao, W., Mixture of linear mixed models using multivariate t distribution, J. Stat. Comput. Simul., 86, 771-787 (2016) · Zbl 1510.62272
[7] Barndorff-Nielsen, O.; Halgreen, C., Infinite divisibility of the hyperbolic and generalized inverse Gaussian distributions, Z. Wahrscheinlichkeitstheor. Verw. Geb., 38, 309-311 (1977) · Zbl 0403.60026
[8] Basford, K.; Greenway, D.; McLachlan, G.; Peel, D., Standard errors of fitted means under normal mixture models, Comput. Stat., 12, 1-17 (1997) · Zbl 0924.62055
[9] Bedrick, E. J.; Tsai, C. L., Model selection for multivariate regression in small samples, Biometrics, 226-231 (1994) · Zbl 0825.62564
[10] Benaglia, T.; Chauveau, D.; Hunter, D. R., An EM-like algorithm for semi- and nonparametric estimation in multivariate mixtures, J. Comput. Graph. Stat., 18, 505-526 (2009)
[11] Benites, L.; Maehara, R.; Lachos, V. H.; Bolfarine, H., Linear regression models using finite mixtures of skew heavy-tailed distributions, Chil. J. Stat., 10 (2019) · Zbl 1434.62146
[12] Birnbaum, Z. W.; Saunders, S. C., A new family of life distributions, J. Appl. Probab., 319-327 (1969) · Zbl 0209.49801
[13] Browne, R. P.; McNicholas, P. D., A mixture of generalized hyperbolic distributions, Can. J. Stat., 43, 176-198 (2015) · Zbl 1320.62144
[14] Capitanio, A.; Azzalini, A.; Stanghellini, E., Graphical models for skew-normal variates, Scand. J. Stat., 30, 129-144 (2003) · Zbl 1035.60008
[15] Cohen, E. A., Some effects of inharmonic partials on interval perception, Music Percept., 1, 323-349 (1984)
[16] Cook, R. D.; Weisberg, S., Residuals and Influence in Regression (1982), Chapman and Hall: Chapman and Hall New York · Zbl 0564.62054
[17] Dempster, A. P.; Laird, N. M.; Rubin, D. B., Maximum likelihood from incomplete data via the EM algorithm, J. R. Stat. Soc., Ser. B, Stat. Methodol., 39, 1-22 (1977) · Zbl 0364.62022
[18] Desmond, A., On the relationship between two fatigue-life models, IEEE Trans. Reliab., 35, 167-169 (1986) · Zbl 0592.62089
[19] Frühwirth-Schnatter, S., Finite Mixture and Markov Switching Models (2006), Springer: Springer New York · Zbl 1108.62002
[20] Galimberti, G.; Soffritti, G., A multivariate linear regression analysis using finite mixtures of t distributions, Comput. Stat. Data Anal., 71, 138-150 (2014) · Zbl 1471.62070
[21] García-Escudero, L.; Gordaliza, A.; Mayo-Iscar, A.; Martín, R. S., Robust clusterwise linear regression through trimming, Comput. Stat. Data Anal., 54, 3057-3069 (2010) · Zbl 1284.62198
[22] García-Escudero, L. A.; Gordaliza, A., Robustness properties of k means and trimmed k means, J. Am. Stat. Assoc., 94, 956-969 (1999) · Zbl 1072.62547
[23] Gershenfeld, N., Nonlinear inference and cluster-weighted modeling, Ann. N.Y. Acad. Sci., 808, 18-24 (1997)
[24] Goldfeld, S.; Quandt, R., A Markov model for switching regression, J. Econom., 1, 3-15 (1973)
[25] Good, I. J., The population frequencies of species and the estimation of population parameters, Biometrika, 40, 237-264 (1953) · Zbl 0051.37103
[26] Hartigan, J. A.; Wong, M. A., Algorithm AS 136: a k-means clustering algorithm, J. R. Stat. Soc., Ser. C, Appl. Stat., 28, 100-108 (1979) · Zbl 0447.62062
[27] Hennig, C., Identifiability of models for clusterwise linear regression, J. Classif., 17, 273-296 (2000) · Zbl 1017.62058
[28] Hubert, L.; Arabie, P., Comparing partitions, J. Classif., 2, 193-218 (1985)
[29] Hunter, D. R.; Young, D. S., Semiparametric mixtures of regressions, J. Nonparametr. Stat., 24, 19-38 (2012) · Zbl 1241.62055
[30] Ingrassia, S.; Minotti, S. C.; Vittadin, G., Local statistical modeling via the cluster-weighted approach with elliptical distributions, J. Classif., 29, 363-401 (2012) · Zbl 1360.62335
[31] Ingrassia, S.; Minotti, S. C.; Punzo, A., Model-based clustering via linear cluster-weighted models, Comput. Stat. Data Anal., 71, 159-182 (2014) · Zbl 1471.62095
[32] Ingrassia, S.; Punzo, A.; Vittadini, G.; Minotti, S. C., The generalized linear mixed cluster-weighted model, J. Classif., 32, 85-113 (2015) · Zbl 1331.62310
[33] Jacobs, R. A.; Jordan, M. I.; Nowlan, S. J.; Hinton, G. E., Adaptive mixtures of local experts, Neural Comput., 3, 79-87 (1991)
[34] Lindley, D., Fiducial distributions and Bayes theorem, J. R. Stat. Soc., Ser. B, Stat. Methodol., 20, 102-107 (1958) · Zbl 0085.35503
[35] Liu, C.; Rubin, D. B., The ECME algorithm: a simple extension of EM and ECM with faster monotone convergence, Biometrika, 81, 633-648 (1994) · Zbl 0812.62028
[36] Liu, M.; Lin, T. I., A skew-normal mixture regression model, Educ. Psychol. Meas., 74, 139-162 (2014)
[37] Louis, T. A., Finding the observed information matrix when using the EM algorithm, J. R. Stat. Soc., Ser. B, Stat. Methodol., 44, 226-233 (1982) · Zbl 0488.62018
[38] Mazza, A.; Punzo, A., Mixtures of multivariate contaminated normal regression models, Stat. Pap., 61, 787-822 (2017) · Zbl 1435.62238
[39] McNeil, A. J.; Frey, R.; Embrechts, P., Quantitative Risk Management: Concepts, Techniques and Tools-Revised Edition (2005), Princeton University Press · Zbl 1089.91037
[40] Meng, X. L.; Rubin, D. B., Maximum likelihood estimation via the ECM algorithm: a general framework, Biometrika, 80, 267-278 (1993) · Zbl 0778.62022
[41] Mirfarah, E.; Naderi, M.; Chen, D. G., Mixture of linear experts model for censored data: a novel approach with scale-mixture of normal distributions, Comput. Stat. Data Anal., 158, Article 107182 pp. (2021) · Zbl 1510.62281
[42] Naderi, M.; Arabpour, A.; Jamalizadeh, A., Multivariate normal mean-variance mixture distribution based on Lindley distribution, Commun. Stat., Simul. Comput., 47, 1179-1192 (2018) · Zbl 07549517
[43] Naderi, M.; Arabpour, A.; Lin, T. I.; Jamalizadeh, A., Nonlinear regression models based on the normal mean-variance mixture of Birnbaum-Saunders distribution, J. Korean Stat. Soc., 46, 476-485 (2017) · Zbl 1368.62184
[44] Naderi, M.; Hung, W. L.; Lin, T. I.; Jamalizadeh, A., A novel mixture model using the multivariate normal mean-variance mixture of Birnbaum-Saunders distributions and its application to extrasolar planets, J. Multivar. Anal., 171, 126-138 (2019) · Zbl 1417.62176
[45] Pourmousa, R.; Jamalizadeh, A.; Rezapour, M., Multivariate normal mean-variance mixture distribution based on Birnbaum-Saunders distribution, J. Stat. Comput. Simul., 85, 2736-2749 (2015) · Zbl 1457.62164
[46] Punzo, A.; McNicholas, P. D., Robust clustering in regression analysis via the contaminated Gaussian cluster-weighted model, J. Classif., 34, 249-293 (2017) · Zbl 1373.62316
[47] Quandt, R., A new approach to estimating switching regressions, J. Am. Stat. Assoc., 67, 306-310 (1972) · Zbl 0237.62047
[48] Richardson, S.; Green, P. J., On Bayesian analysis of mixtures with an unknown number of components (with discussion), J. R. Stat. Soc., Ser. B, Stat. Methodol., 59, 731-792 (1997) · Zbl 0891.62020
[49] Rousseeuw, P. J.; Leroy, A. M., Robust Regression and Outlier Detection (2005), John Wiley & Sons
[50] Schreuder, H. T.; Hafley, W. L., A useful bivariate distribution for describing stand structure of tree heights and diameters, Biometrics, 33, 471-478 (1977)
[51] Schwarz, G., Estimating the dimension of a model, Ann. Stat., 6, 461-464 (1978) · Zbl 0379.62005
[52] Sclove, S. L., Application of model-selection criteria to some problems in multivariate analysis, Psychometrika, 52, 333-343 (1987)
[53] Song, W.; Yao, W.; Xing, Y., Robust mixture regression model fitting by Laplace distribution, Comput. Stat. Data Anal., 71, 128-137 (2014) · Zbl 1471.62189
[54] Späth, H., Algorithm 39. Clusterwise linear regression, Computing, 22, 367-373 (1979) · Zbl 0387.65028
[55] Stephens, M., Dealing with label switching in mixture models, J. R. Stat. Soc., Ser. B, Stat. Methodol., 62, 795-809 (2000) · Zbl 0957.62020
[56] Tzortzis, G.; Likas, A., The MinMax K-means clustering algorithm, Lect. Notes Comput. Sci., 47, 2505-2516 (2014)
[57] Verbeke, G.; Lesaffre, E., A linear mixed-effects model with heterogeneity in the random-effects population, J. Am. Stat. Assoc., 91, 217-221 (1996) · Zbl 0870.62057
[58] Viele, K.; Tong, B., Modeling with mixtures of linear regressions, Stat. Comput., 12, 315-330 (2002)
[59] Vilca, F.; Balakrishnan, N.; Zeller, C. B., Multivariate skew-normal generalized hyperbolic distribution and its properties, J. Multivar. Anal., 128, 73-85 (2014) · Zbl 1352.62080
[60] Vinh, N. X.; Epps, J.; Bailey, J., Information theoretic measures for clusterings comparison: variants, properties, normalization and correction for chance, J. Mach. Learn. Res., 11, 2837-2854 (2010) · Zbl 1242.62062
[61] Wang, W. L., Mixture of multivariate-t linear mixed models for multi-outcome longitudinal data with heterogeneity, Stat. Sin., 27, 733-760 (2017) · Zbl 1391.62124
[62] Wang, W. L., Mixture of multivariate t nonlinear mixed models for multiple longitudinal data with heterogeneity and missing values, Test, 28, 196-222 (2019) · Zbl 1420.62290
[63] Yang, Y. C.; Lin, T. I.; Castro, L. M.; Wang, W. L., Extending finite mixtures of t linear mixed-effects models with concomitant covariates, Comput. Stat. Data Anal., 148, Article 106961 pp. (2020) · Zbl 1510.62286
[64] Yao, W.; Wei, Y.; Yu, C., Robust mixture regression using the t-distribution, Comput. Stat. Data Anal., 71, 116-127 (2014) · Zbl 1471.62227
[65] Zeller, C. B.; Cabral, C. R.; Lachos, V. H., Robust mixture regression modeling based on scale mixtures of skew-normal distributions, Test, 25, 375-396 (2016) · Zbl 1342.62113
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.