×

Robust fitting of mixture regression models. (English) Zbl 1252.62011

Summary: The existing methods for fitting mixture regression models assume a normal distribution for the error and then estimate the regression parameters by the maximum likelihood estimate (MLE). We demonstrate that the MLE, like the least squares estimate, is sensitive to outliers and heavy-tailed error distributions. We propose a robust estimation procedure and an EM-type algorithm to estimate the mixture regression models. Using a Monte Carlo simulation study, we demonstrate that the proposed new estimation method is robust and works much better than the MLE when there are outliers or the error distribution has heavy tails. In addition, the proposed robust method works comparably to the MLE when there are no outliers and the error is normal. A real data application is used to illustrate the success of the proposed robust estimation procedure.

MSC:

62F10 Point estimation
62F35 Robustness and adaptive procedures (parametric inference)
62J05 Linear regression; mixed models
65C05 Monte Carlo methods

Software:

robustbase

References:

[1] Andrews, D. F., A robust method for multiple linear regression, Technometrics, 16, 523-531 (1974) · Zbl 0294.62082
[2] Beaton, A. E.; Tukey, J. W., The fitting of power series, meaning polynomials, illustrated on band-spectroscopic data, Technometrics, 16, 147-185 (1974) · Zbl 0282.62057
[3] Celeux, G.; Hurn, M.; Robert, C. P., Computational and inferential difficulties with mixture posterior distributions, Journal of the American Statistical Association, 95, 957-970 (2000) · Zbl 0999.62020
[4] Chen, J.; Tan, X.; Zhang, R., Inference for normal mixture in mean and variance, Statistica Sinica, 18, 443-465 (2008) · Zbl 1135.62018
[5] Cohen, E., Some effects of inharmonic partials on interval perception, Music Perception, 1, 323-349 (1984)
[6] Dempster, A. P.; Laird, N. M.; Rubin, D. B., Maximum likelihood from incomplete data via the EM algorithm (with discussion), Journal of the Royal Statistical Society, Series B, 39, 1-38 (1977) · Zbl 0364.62022
[7] Elashoff, M.; Ryan, L., An EM algorithm for estimating equations, Journal of Computational and Graphical Statistics, 13, 48-65 (2004)
[8] García-Escudero, L. A.; Gordaliza, A.; Mayo-Iscara, A.; San Martín, R., Robust clusterwise linear regression through trimming, Computational Statistics & Data Analysis, 54, 3057-3069 (2010) · Zbl 1284.62198
[9] García-Escudero, L. A.; Gordaliza, A.; San Martín, R.; Van Aelst, S.; Zamar, R., Robust linear clustering, Journal of the Royal Statistical Society, Series B, 71, 301-318 (2009) · Zbl 1231.62112
[10] (Godambe, V. P., Estimating Functions (1991), Oxford University Press: Oxford University Press USA) · Zbl 0745.00006
[11] Hampel, F. R.; Ronchetti, E. M.; Rousseeuw, P. J.; Stahel, W. A., Robust Statistics: The Approach Based on Influence Functions (1986), Wiley: Wiley New York · Zbl 0593.62027
[12] Hanfelt, J. J.; Liang, K.-Y., Approximate likelihood ratios for general estimating equations, Biometrika, 82, 461-477 (1995) · Zbl 0831.62025
[13] Hanfelt, J. J.; Liang, K.-Y., Approximate likelihood for generalized linear errors-in-variables models, Journal of the Royal Statistical Society, Series B, 59, 627-637 (1997) · Zbl 1090.62547
[14] Hathaway, R. J., A constrained formulation of maximum-likelihood estimation for normal mixture distributions, Annals of Statistics, 13, 795-800 (1985) · Zbl 0576.62039
[15] Hathaway, R. J., A constrained EM algorithm for univariate mixtures, Journal of Statistical Computation and Simulation, 23, 211-230 (1986)
[16] Hennig, C., Identifiability of models for clusterwise linear regression, Journal of Classification, 17, 273-296 (2000) · Zbl 1017.62058
[17] Hennig, C., Fixed point clusters for linear regression: computation and comparison, Journal of Classification, 19, 249-276 (2002) · Zbl 1017.62057
[18] Hennig, C., Clusters, outliers, and regression: fixed point clusters, Journal of Multivariate Analysis, 86, 183-212 (2003) · Zbl 1020.62051
[19] Hennig, C., Breakdown points for maximum likelihood-estimators of location-scale mixtures, Annals of Statistics, 32, 1313-1340 (2004) · Zbl 1047.62063
[20] Holland, P. W.; Welsch, R. E., Robust regression using iteratively reweighted least squares, Communications in Statistics-Theory and Methods, 6, 813-827 (1977) · Zbl 0376.62035
[21] Huber, P. J., Robust regression: asymptotics, conjectures, and Monte Carlo, Annals of Statistics, 1, 799-821 (1973) · Zbl 0289.62033
[22] Huber, P. J., Robust Statistics (1981), Wiley: Wiley New York · Zbl 0536.62025
[23] Jacobs, R. A.; Jordan, M. I.; Nowlan, S. J.; Hinton, G. E., Adaptive mixtures of local experts, Neural Computation, 3, 1, 79-87 (1991)
[24] Jiang, W.; Tanner, M. A., Hierarchical mixtures-of-experts for exponential family regression models: approximation and maximum likelihood estimation, The Annals of Statistics, 27, 987-1011 (1999) · Zbl 0957.62032
[25] Li, B., A deviance function for the quasi-likelihood method, Biometrika, 80, 741-753 (1993) · Zbl 0796.62025
[26] Li, J.; Ray, S.; Lindsay, B. G., A nonparametric statistical approach to clustering via mode identification, Journal of Machine Learning Research, 8, 8, 1687-1723 (2007) · Zbl 1222.62076
[27] Markatou, M., Mixture models, robustness, and the weighted likelihood methodology, Biometrics, 56, 483-486 (2000) · Zbl 1060.62511
[28] Maronna, R. A.; Martin, R. D.; Yohai, V. J., Robust Statistics: Theory and Methods (2006), Wiley: Wiley New York · Zbl 1094.62040
[29] McCullagh, P.; Nelder, J. A., Generalized Linear Models (1989), Chapman and Hall: Chapman and Hall London · Zbl 0744.62098
[30] McLachlan, G. J.; Peel, D., Finite Mixture Models (2000), Wiley: Wiley New York · Zbl 0963.62061
[31] Mueller, C. H.; Garlipp, T., Simple consistent cluster methods based on redescending \(M\)-estimators with an application to edge identification in images, Journal of Multivariate Analysis, 92, 359-385 (2005) · Zbl 1062.62114
[32] Neykov, N.; Filzmoser, P.; Dimova, R.; Neytchev, P., Robust fitting of mixtures using the trimmed likelihood estimator, Computational Statistics & Data Analysis, 52, 299-308 (2007) · Zbl 1328.62033
[33] Rousseeuw, P. J.; Leroy, A. M., Robust Regression and Outier Detection (1987), Wiley: Wiley New York · Zbl 0711.62030
[34] Rousseeuw, P. J.; Yohai, V. J., Robust regression by means of \(S\)-estimators, (Franke, J.; Härdle, W.; Martin, R. D., Robust and Nonlinear Time Series. Robust and Nonlinear Time Series, Lectures Notes in Statistics, vol. 26 (1984), Springer: Springer New York), 256-272 · Zbl 0567.62027
[35] Shen, H.; Yang, J.; Wang, S., Outlier detecting in fuzzy switching regression models, (Artificial Intelligence: Methodology, Systems, and Applications. Artificial Intelligence: Methodology, Systems, and Applications, Lecture Notes in Computer Science, vol. 3192 (2004)), 208-215
[36] Skrondal, A.; Rabe-Hesketh, S., Generalized Latent Variable Modeling: Multilevel, Longitudinal and Structural Equation Models (2004), Chapman and Hall, CRC: Chapman and Hall, CRC Boca Raton · Zbl 1097.62001
[37] Stephens, M., Dealing with label switching in mixture models, Journal of the Royal Statistical Society, Series B, 62, 795-809 (2000) · Zbl 0957.62020
[38] Wedel, M.; Kamakura, W. A., Market Segmentation: Conceptual and Methodological Foundations (2000), Kluwer Academic Publishers: Kluwer Academic Publishers Norwell, MA, Journal of Classification, Springer, New York
[39] Yao, W., A profile likelihood method for normal mixture with unequal variance, Journal of Statistical Planning and Inference, 140, 2089-2098 (2010) · Zbl 1184.62029
[40] Yao, W.; Lindsay, B. G., Bayesian mixture labeling by highest posterior density, Journal of American Statistical Association, 104, 758-767 (2009) · Zbl 1388.62007
[41] Yohai, V. J., High breakdown point and high efficiency estimates for regression, The Annals of Statistics, 15, 642-665 (1987) · Zbl 0624.62037
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.