×

From here to infinity: sparse finite versus Dirichlet process mixtures in model-based clustering. (English) Zbl 1474.62225

Summary: In model-based clustering mixture models are used to group data points into clusters. A useful concept introduced for Gaussian mixtures by G. Malsiner-Walli et al. [Stat. Comput. 26, No. 1–2, 303–324 (2016; Zbl 1342.62109)] are sparse finite mixtures, where the prior distribution on the weight distribution of a mixture with \(K\) components is chosen in such a way that a priori the number of clusters in the data is random and is allowed to be smaller than \(K\) with high probability. The number of clusters is then inferred a posteriori from the data. The present paper makes the following contributions in the context of sparse finite mixture modelling. First, it is illustrated that the concept of sparse finite mixture is very generic and easily extended to cluster various types of non-Gaussian data, in particular discrete data and continuous multivariate data arising from non-Gaussian clusters. Second, sparse finite mixtures are compared to Dirichlet process mixtures with respect to their ability to identify the number of clusters. For both model classes, a random hyper prior is considered for the parameters determining the weight distribution. By suitable matching of these priors, it is shown that the choice of this hyper prior is far more influential on the cluster solution than whether a sparse finite mixture or a Dirichlet process mixture is taken into consideration.

MSC:

62H30 Classification and discrimination; cluster analysis (statistical aspects)
60G12 General second-order stochastic processes

Citations:

Zbl 1342.62109

References:

[1] Aitkin M (1996) A general maximum likelihood analysis of overdispersion in generalized linear models. Stat Comput 6:251-262 · doi:10.1007/BF00140869
[2] Azzalini A (1985) A class of distributions which includes the normal ones. Scand J Stat 12:171-178 · Zbl 0581.62014
[3] Azzalini A (1986) Further results on a class of distributions which includes the normal ones. Statistica 46:199-208 · Zbl 0606.62013
[4] Azzalini A, Capitanio A (2003) Distributions generated by perturbation of symmetry with emphasis on a multivariate skew t-distribution. J R Stat Soc Ser B 65:367-389 · Zbl 1065.62094 · doi:10.1111/1467-9868.00391
[5] Azzalini A, Dalla Valle A (1996) The multivariate skew normal distribution. Biometrika 83:715-726 · Zbl 0885.62062 · doi:10.1093/biomet/83.4.715
[6] Banfield JD, Raftery AE (1993) Model-based Gaussian and non-Gaussian clustering. Biometrics 49:803-821 · Zbl 0794.62034 · doi:10.2307/2532201
[7] Bennett DA, Schneider JA, Buchman AS, de Leon CM, Bienias JL, Wilson RS (2005) The rush memory and aging project: study design and baseline characteristics of the study cohort. Neuroepidemiology 25:163-175 · doi:10.1159/000087446
[8] Bensmail H, Celeux G, Raftery AE, Robert CP (1997) Inference in model-based cluster analysis. Stat Comput 7:1-10 · doi:10.1023/A:1018510926151
[9] Biernacki C, Celeux G, Govaert G (2000) Assessing a mixture model for clustering with the integrated completed likelihood. IEEE Trans Pattern Anal Mach Intell 22:719-725 · doi:10.1109/34.865189
[10] Celeux G, Forbes F, Robert CP, Titterington DM (2006) Deviance information criteria for missing data models. Bayesian Anal 1:651-674 · Zbl 1331.62329 · doi:10.1214/06-BA122
[11] Celeux, G.; Frühwirth-Schnatter, S.; Robert, CP; Frühwirth-Schnatter, S. (ed.); Celeux, G. (ed.); Robert, CP (ed.), Model selection for mixture models—perspectives and strategies, 121-160 (2018), Boca Raton
[12] Clogg CC, Goodman LA (1984) Latent structure analysis of a set of multidimensional contincency tables. J Am Stat Assoc 79:762-771 · Zbl 0547.62037 · doi:10.1080/01621459.1984.10477093
[13] Dellaportas P, Papageorgiou I (2006) Multivariate mixtures of normals with unknown number of components. Stat Comput 16:57-68 · doi:10.1007/s11222-006-5338-6
[14] Escobar MD, West M (1995) Bayesian density estimation and inference using mixtures. J Am Stat Assoc 90:577-588 · Zbl 0826.62021 · doi:10.1080/01621459.1995.10476550
[15] Escobar, MD; West, M.; Dey, D. (ed.); Müller, P. (ed.); Sinha, D. (ed.), Computing nonparametric hierarchical models, 1-22 (1998), Berlin · Zbl 0918.62028
[16] Fall MD, Barat É (2014) Gibbs sampling methods for Pitman-Yor mixture models. Working paper https://hal.archives-ouvertes.fr/hal-00740770/file/Fall-Barat.pdf
[17] Ferguson TS (1973) A Bayesian analysis of some nonparametric problems. Ann Stat 1:209-230 · Zbl 0255.62037 · doi:10.1214/aos/1176342360
[18] Ferguson TS (1974) Prior distributions on spaces of probability measures. Ann Stat 2:615-629 · Zbl 0286.62008 · doi:10.1214/aos/1176342752
[19] Ferguson, TS; Rizvi, MH (ed.); Rustagi, JS (ed.), Bayesian density estimation by mixtures of normal distributions, 287-302 (1983), New York · Zbl 0557.62030 · doi:10.1016/B978-0-12-589320-6.50018-6
[20] Frühwirth-Schnatter S (2004) Estimating marginal likelihoods for mixture and Markov switching models using bridge sampling techniques. Econom J 7:143-167 · Zbl 1053.62087 · doi:10.1111/j.1368-423X.2004.00125.x
[21] Frühwirth-Schnatter S (2006) Finite mixture and Markov switching models. Springer, New York · Zbl 1108.62002
[22] Frühwirth-Schnatter, S.; Mengersen, K. (ed.); Robert, CP (ed.); Titterington, D. (ed.), Dealing with label switching under model uncertainty, 213-239 (2011), Chichester · doi:10.1002/9781119995678.ch10
[23] Frühwirth-Schnatter, S.; Mengersen, K. (ed.); Robert, CP (ed.); Titterington, D. (ed.), Label switching under model uncertainty, 213-239 (2011), Hoboken · doi:10.1002/9781119995678.ch10
[24] Frühwirth-Schnatter S, Pyne S (2010) Bayesian inference for finite mixtures of univariate and multivariate skew normal and skew-t distributions. Biostatistics 11:317-336 · Zbl 1437.62465 · doi:10.1093/biostatistics/kxp062
[25] Frühwirth-Schnatter S, Wagner H (2008) Marginal likelihoods for non-Gaussian models using auxiliary mixture sampling. Comput Stat Data Anal 52:4608-4624 · Zbl 1452.62060 · doi:10.1016/j.csda.2008.03.028
[26] Frühwirth-Schnatter S, Frühwirth R, Held L, Rue H (2009) Improved auxiliary mixture sampling for hierarchical models of non-Gaussian data. Stat Comput 19:479-492 · doi:10.1007/s11222-008-9109-4
[27] Frühwirth-Schnatter S, Celeux G, Robert CP (eds) (2018) Handbook of mixture analysis. CRC Press, Boca Raton
[28] Goodman LA (1974) Exploratory latent structure analysis using both identifiable and unidentifiable models. Biometrika 61:215-231 · Zbl 0281.62057 · doi:10.1093/biomet/61.2.215
[29] Green PJ, Richardson S (2001) Modelling heterogeneity with and without the Dirichlet process. Scand J Stat 28:355-375 · Zbl 0973.62031 · doi:10.1111/1467-9469.00242
[30] Grün, B.; Frühwirth-Schnatter, S. (ed.); Celeux, G. (ed.); Robert, CP (ed.), Model-based clustering, 163-198 (2018), Boca Raton
[31] Hubert L, Arabie P (1985) Comparing partitions. J Classif 2(1):193-218 · Zbl 0587.62128 · doi:10.1007/BF01908075
[32] Ishwaran H, James LF (2001) Gibbs sampling methods for stick-breaking priors. J Am Stat Assoc 96:161-173 · Zbl 1014.62006 · doi:10.1198/016214501750332758
[33] Kalli M, Griffin JE, Walker SG (2011) Slice sampling mixture models. Stat Comput 21:93-105 · Zbl 1256.65006 · doi:10.1007/s11222-009-9150-y
[34] Keribin C (2000) Consistent estimation of the order of mixture models. Sankhyā A 62:49-66 · Zbl 1081.62516
[35] Lau JW, Green P (2007) Bayesian model-based clustering procedures. J Comput Graph Stat 16:526-558 · doi:10.1198/106186007X238855
[36] Lazarsfeld PF, Henry NW (1968) Latent structure analysis. Houghton Mifflin, New York · Zbl 0182.52201
[37] Lee S, McLachlan GJ (2013) Model-based clustering and classification with non-normal mixture distributions. Stat Methods Appl 22:427-454 · Zbl 1332.62209 · doi:10.1007/s10260-013-0237-4
[38] Linzer DA, Lewis JB (2011) polca: an R package for polytomous variable latent class analysis. J Stat Softw 42(10):1-29 · doi:10.18637/jss.v042.i10
[39] Malsiner Walli G, Frühwirth-Schnatter S, Grün B (2016) Model-based clustering based on sparse finite Gaussian mixtures. Stat Comput 26:303-324 · Zbl 1342.62109 · doi:10.1007/s11222-014-9500-2
[40] Malsiner Walli G, Frühwirth-Schnatter S, Grün B (2017) Identifying mixtures of mixtures using Bayesian estimation. J Comput Graph Stat 26:285-295 · Zbl 1342.62109 · doi:10.1080/10618600.2016.1200472
[41] Malsiner-Walli G, Pauger D, Wagner H (2018) Effect fusion using model-based clustering. Stat Model 18:175-196 · Zbl 07289504 · doi:10.1177/1471082X17739058
[42] McLachlan GJ, Peel D (2000) Finite mixture models. Wiley series in probability and statistics. Wiley, New York · Zbl 0963.62061 · doi:10.1002/0471721182
[43] Medvedovic M, Yeung KY, Bumgarner RE (2004) Bayesian mixture model based clustering of replicated microarray data. Bioinformatics 20:1222-1232 · doi:10.1093/bioinformatics/bth068
[44] Miller JW, Harrison MT (2013) A simple example of Dirichlet process mixture inconsistency for the number of components. In: Advances in neural information processing systems, pp 199-206
[45] Miller JW, Harrison MT (2018) Mixture models with a prior on the number of components. J Am Stat Assoc 113:340-356 · Zbl 1398.62066 · doi:10.1080/01621459.2016.1255636
[46] Müller P, Mitra R (2013) Bayesian nonparametric inference—why and how. Bayesian Anal 8:269-360 · Zbl 1329.62171 · doi:10.1214/13-BA811
[47] Nobile A (2004) On the posterior distribution of the number of components in a finite mixture. Ann Stat 32:2044-2073 · Zbl 1056.62037 · doi:10.1214/009053604000000788
[48] Papaspiliopoulos O, Roberts G (2008) Retrospective Markov chain Monte Carlo methods for Dirichlet process hierarchical models. Biometrika 95:169-186 · Zbl 1437.62576 · doi:10.1093/biomet/asm086
[49] Polson NG, Scott JG, Windle J (2013) Bayesian inference for logistic models using Pólya-Gamma latent variables. J Am Stat Assoc 108:1339-49 · Zbl 1283.62055 · doi:10.1080/01621459.2013.829001
[50] Quintana FA, Iglesias PL (2003) Bayesian clustering and product partition models. J R Stat Soc Ser B 65:557-574 · Zbl 1065.62115 · doi:10.1111/1467-9868.00402
[51] Richardson S, Green PJ (1997) On Bayesian analysis of mixtures with an unknown number of components. J R Stat Soc Ser B 59:731-792 · Zbl 0891.62020 · doi:10.1111/1467-9868.00095
[52] Rousseau J, Mengersen K (2011) Asymptotic behaviour of the posterior distribution in overfitted mixture models. J R Stat Soc Ser B 73:689-710 · Zbl 1228.62034 · doi:10.1111/j.1467-9868.2011.00781.x
[53] Sethuraman J (1994) A constructive definition of Dirichlet priors. Stat Sin 4:639-650 · Zbl 0823.62007
[54] Stern H, Arcus D, Kagan J, Rubin DB, Snidman N (1994) Statistical choices in infant temperament research. Behaviormetrika 21:1-17 · doi:10.2333/bhmk.21.1
[55] van Havre Z, White N, Rousseau J, Mengersen K (2015) Overfitting Bayesian mixture models with an unknown number of components. PLoS ONE 10(7):e0131739, 1-27
[56] Viallefont V, Richardson S, Green PJ (2002) Bayesian analysis of Poisson mixtures. J Nonparametr Stat 14:181-202 · Zbl 1014.62035 · doi:10.1080/10485250211383
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.