×

A combined likelihood ratio/information ratio bootstrap technique for estimating the number of components in finite mixtures. (English) Zbl 1471.62164

Summary: Modified MIR is a Monte-Carlo algorithm used for bootstrapping minimum information ratios in order to assess the number of unknown components in finite mixtures. The method was proposed as a modification of the minimum information ratio (MIR) method, and was proved to outperform it. Further simulations and a comparison with some other approaches confirm that the method works well for reasonable sample sizes. However, an important drawback which occurs with information ratio driven methods is that they do not allow for testing for the hypothesis of a single-component model. In order to overcome this problem, a combined method is proposed which consists of including a bootstrap likelihood ratio step and a modified MIR step into a single programming package. The bootstrap likelihood ratio methods show in general nice performances, so the combined method is also expected to be adequate for detecting single-component models. This, in turn, implies that the performance of the method is expected to be very similar to that of modified MIR in situations where the model is a true mixture. A simulation exercise is carried out, which confirms this feeling. This result is then in support of using the combined method rather than modified MIR for practical applications.

MSC:

62-08 Computational methods for problems pertaining to statistics
62F10 Point estimation
62F40 Bootstrap, jackknife and other resampling methods
62H30 Classification and discrimination; cluster analysis (statistical aspects)
Full Text: DOI

References:

[1] Aitkin, M.; Anderson, D.; Hinde, J., Statistical modelling of data on teaching styles (with discussion), Journal of the Royal Statistical Society. Series A, 144, 419-461, (1981)
[2] Bezdek, J. C., Pattern recognition with fuzzy objective function algorithms, (1981), Plenum New York · Zbl 0503.68069
[3] Böhning, D., Computer-assisted analysis of mixtures and applications, (2000), Chapman & Hall/CRC London, pp. 77-87 · Zbl 0951.62088
[4] Bozdogan, H., 1983. Determining the number of component clusters in the standard multivariate normal mixture model using model selection criteria. Research Report UIC/DQM/A83-1, University of Illinois at Chicago, Quantitative Methods Department.
[5] Cutler, A., Windham, M.P., 1994. Information-based validity functionals for mixture analysis. In: Proceedings of the First US/Japan Conference on the Frontiers of Statistical Modeling: An Informational Approach, pp. 149-170.
[6] Dempster, A. P.; Laird, N. M.; Rubin, D. B., Maximum likelihood estimation from incomplete data via the EM algorithm (with discussion), Journal of the Royal Statistical Society: Series B, 39, 1-38, (1977) · Zbl 0364.62022
[7] Dudoit, S.; Fridlyand, J., A prediction-based resampling method for estimating the number of clusters in a data set, Genome Biology, 3, (2002), research0036.1-0036.21
[8] Efron, B., Bootstrap methods: another look at the jackknife, The Annals of Statistics, 7, 1-26, (1979) · Zbl 0406.62024
[9] Lindsay, B. G.; Markatou, M.; Ray, S.; Yang, K.; Chen, S.-C., Quadratic distances on probabilities: a unified foundation, The Annals of Statistics, 36, 983-1006, (2008) · Zbl 1133.62001
[10] Marron, J. S.; Wand, M. P., Exact mean integrated squared error, The Annals of Statistics, 20, 712-736, (1992) · Zbl 0746.62040
[11] McLachlan, G. J., On bootstrapping the likelihood ratio test statistic for the number of components in a normal mixture, Applied Statistics, 36, 318-324, (1987)
[12] McLachlan, G. J.; Khan, N., On a resampling approach for tests on the number of clusters with mixture model-based clustering of tissue samples, Journal of Multivariate Analysis, 90, 90-105, (2004) · Zbl 1052.65006
[13] McLachlan, G. J.; Peel, D., Finite mixture models, (2000), Wiley New York, pp. 61-63 · Zbl 0963.62061
[14] Meng, X. L., On the rate of convergence of the ECM algorithm, The Annals of Statistics, 22, 326-339, (1994) · Zbl 0803.65146
[15] Meng, X. L.; Rubin, D. B., On the global and componentwise rates of convergence of the EM algorithm, Linear Algebra and its Applications, 199, 413-425, (1994), (special issue in honour of Ingram Olkin) · Zbl 0818.65153
[16] Orchard, T., Woodbury, M.A., 1972. A missing information principle: theory and application. In: Proceedings of the 6th Berkeley Symposium on Mathematical Statistics and Probability, Vol. 1, pp. 697-715. · Zbl 0263.62023
[17] Polymenis, A.; Titterington, D. M., On the determination of the number of components in a mixture, Statistics & Probability Letters, 38, 295-298, (1998) · Zbl 1007.62508
[18] Ray, S.; Lindsay, B. G., Model selection in high dimensions: a quadratic-risk-based approach, Journal of the Royal Statistical Society, 70, Part 1, 95-118, (2008) · Zbl 1400.62039
[19] Schwarz, G., Estimating the dimension of a model, The Annals of Statistics, 6, 461-464, (1978) · Zbl 0379.62005
[20] Titterington, D. M.; Smith, A. F.M.; Makov, U. E., Statistical analysis of finite mixture distributions, (1985), Wiley New York, pp. 152-153 · Zbl 0646.62013
[21] Windham, M. P.; Cutler, A., Information ratios for validating mixture analyses, Journal of the American Statistical Association, 87, 1188-1192, (1992)
[22] Wolfe, J.H., 1971. A Monte-Carlo study of sampling distribution of the likelihood ratio for mixtures of multinormal distributions. Technical Bulletin STB 72-2. San Diego: US Naval Personnel and Training Research Laboratory.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.