×

Multivariate methods using mixtures: correspondence analysis, scaling and pattern-detection. (English) Zbl 1471.62162

Summary: Matrices of binary or count data are modelled under a unified statistical framework using finite mixtures to group the rows and/or columns. These likelihood-based one-mode and two-mode fuzzy clusterings provide maximum likelihood estimation of parameters and the options of using likelihood ratio tests or information criteria for model comparison. Geometric developments focused on pattern detection give likelihood-based analogues of various techniques in multivariate analysis, including multidimensional scaling, association analysis, ordination, correspondence analysis, and the construction of biplots. Illustrative examples demonstrate the effectiveness of these visualisations for identifying patterns of ecological significance (e.g. abrupt versus slow species turnover).

MSC:

62-08 Computational methods for problems pertaining to statistics
62H30 Classification and discrimination; cluster analysis (statistical aspects)
62H25 Factor analysis and principal components; correspondence analysis

Software:

R; TWINSPAN
Full Text: DOI

References:

[1] Akaike, H., Information theory as an extension of the maximum likelihood principle, (Petrov, B. N.; Csaki, F., Second International Symposium on Information Theory, (1973), Academiai Kiado), 267-281 · Zbl 0283.62006
[2] Arnold, R.; Hayakawa, Y.; Yip, P., Capture-recapture estimation using finite mixtures of arbitrary dimension, Biometrics, 66, 644-655, (2010) · Zbl 1192.62251
[3] Banfield, J. D.; Raftery, A. E., Model-based gaussian and non-Gaussian clustering, Biometrics, 49, 803-821, (1993) · Zbl 0794.62034
[4] Biernacki, C.; Celeux, G.; Govaert, G., Assessing a mixture model for clustering with the integrated completed likelihood, IEEE Transactions on Pattern Analysis and Machine Learning, 22, 719-725, (2000)
[5] Böhning, D.; Siedel, W.; Alfó, M.; Garel, B.; Patilea, V.; Günther, W., Editorial: advances in mixture models, Computational Statistics and Data Analysis, 51, 5205-5210, (2007) · Zbl 1445.00012
[6] Burnham, K. P.; Anderson, D. R., Model selection and multimodel inference: A practical information-theoretic approach, (2002), Springer New York · Zbl 1005.62007
[7] Dempster, A. P.; Laird, N. M.; Rubin, D. B., Maximum likelihood from incomplete data via the EM algorithm, Journal of the Royal Statistical Society. Series B, 39, 1-38, (1977), (with discussion) · Zbl 0364.62022
[8] Dunstan, P. K.; Foster, S. D.; Darnell, R., Model based grouping of species across environmental gradients, Ecological Modelling, 222, 955-963, (2011)
[9] Everitt, B. S.; Landau, S.; Leese, M., Cluster analysis, (2001), Arnold London · Zbl 1205.62076
[10] Gabriel, K. R., The biplot graphic display of matrices with application to principal component analysis, Biometrika, 58, 453-3467, (1971) · Zbl 0228.62034
[11] Gotelli, N. J.; Graves, G. R., Null models in ecology, (1996), Smithsonian Institution Press Washington DC
[12] Govaert, G.; Nadif, M., Clustering with block mixture models, Pattern Recognition, 36, 463-473, (2003)
[13] Govaert, G.; Nadif, M., An EM algorithm for the block mixture model, IEEE Transactions on Pattern Analysis and Machine Intelligence, 27, 4, 643-647, (2005)
[14] Govaert, G.; Nadif, M., Latent block model for contingency table, Communications in Statistics—Theory and Methods, 39, 416-425, (2010) · Zbl 1187.62117
[15] Greenacre, M.; Hastie, T., The geometric interpretation of correspondence analysis, Journal of the American Statistical Association, 82, 437-447, (1987) · Zbl 0622.62006
[16] Hill, M. O., TWINSPAN—A FORTRAN program for arranging multivariate data in an ordered two-way table by classification of the individuals and attributes, Section of Ecology and Systematics, 90, (1979), Cornell University New York, NY, USA
[17] Kirkpatrick, S.; Gelatt, C. D.; Vecchi, M. P., Optimization by simulated annealing, Science, 220, 671-680, (1983) · Zbl 1225.90162
[18] Manly, B. F.J., Multivariate methods: A primer, (2005), Chapman & Hall/CRC Press Boca Raton, FL · Zbl 1048.62055
[19] Manly, B. F.J., Randomization, bootstrap and Monte Carlo methods in biology, (2007), Chapman and Hall London · Zbl 1269.62076
[20] McLachlan, G. J., The classification and mixture maximum likelihood approaches to cluster analysis, Handbook of Statistics, 2, 199-208, (1982) · Zbl 0513.62064
[21] McLachlan, G. J., On bootstrapping the likelihood ratio test statistic for the number of components in a normal mixture, Applied Statistics, 36, 318-324, (1987)
[22] McLachlan, G. J.; Basford, K. E., Mixture models: inference and applications to clustering, (1988), M. Dekker New York, NY · Zbl 0697.62050
[23] McLachlan, G. J.; Krishnan, T., The EM algorithm and extensions, (1997), Wiley Interscience New York · Zbl 0882.62012
[24] McLachlan, G. J.; Peel, D., Finite mixture models, (2000), Wiley Interscience New York · Zbl 0963.62061
[25] Nadif, M., Govaert, G., 2005. Block clustering of contingency table and mixture model. In: Proceeding of: Advances in Intelligent Data Analysis VI, 6th International Symposium on Intelligent Data Analysis. · Zbl 1165.68418
[26] O’Hagan, A.; Murphy, T. B.; Gormley, I. C., Computational aspects of Fitting mixture models via the expectation-maximization algorithm, Computational Statistics and Data Analysis, 56, 3843-3864, (2012) · Zbl 1255.62180
[27] Pledger, S., Unified maximum likelihood estimates for closed capture-recapture models using mixtures, Biometrics, 56, 434-442, (2000) · Zbl 1060.62652
[28] Quinn, G. P.; Keough, M. J., Experimental design and data analysis for biologists, (2002), Cambridge University Press
[29] R Development Core Team, 2010. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing. Vienna, Austria. ISBN: 3-900051-07-0.
[30] Robinson, W. S., A method for chronologically ordering archaeological deposits, American Antiquity, 16, 293-301, (1951)
[31] Schlattmann, P., Estimating the number of components in a finite mixture model: the special case of homogeneity, Computational Statistics and Data Analysis, 41, 441-451, (2003) · Zbl 1429.62087
[32] Schwarz, G. E., Estimating the dimension of a model, Annals of Statistics, 6, 461-464, (1978) · Zbl 0379.62005
[33] Self, S. G.; Liang, K.-Y., Asymptotic properties of maximum likelihood estimators and likelihood ratio tests under nonstandard conditions, Journal of the American Statistical Association, 82, 605-610, (1987) · Zbl 0639.62020
[34] Shaw, P. J.A., Multivariate statistics for the environmental sciences, (2003), Arnold London · Zbl 1028.62087
[35] Shaw, P. J.A.; Kibby, G.; Mayes, J., Effects of thinning treatment on an ectomycorrhizal succession under scots pine, Mycological Research, 107, 317-328, (2003)
[36] van der Geer, S., Asymptotic theory for maximum likelihood in nonparametric mixture models, Computational Statistics and Data Analysis, 41, 453-464, (2003) · Zbl 1429.62117
[37] Whittaker, R. H., Vegetation of the great smoky mountains, Ecological Monographs, 26, 1-80, (1956)
[38] Wu, X.; Kumar, V.; Quinlan, J. R.; Ghosh, J.; Yang, Q.; Motoda, H.; McLachlan, G. J.; Ng, A.; Liu, B.; Yu, P. S.; Zhou, Z.-H.; Steinbach, M.; Hand, D. J.; Steinberg, D., Top 10 algorithms in data mining, Knowledge and Information Systems, 14, 137, (2008)
[39] Wu, H.-M.; Tzeng, S.; Chen, C.-H., Matrix visualization, (Handbook of Data Visualization, (2007), Springer Berlin), 681-708 · Zbl 1140.68533
[40] Zhou, H.; Lange, K. L., On the bumpy road to the dominant mode, Scandinavian Journal of Statistics, 37, 612-631, (2010) · Zbl 1226.62027
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.