×

Mixture models with an unknown number of components via a new posterior split-merge MCMC algorithm. (English) Zbl 1335.62061

Summary: In this paper we introduce a Bayesian analysis for mixture models with an unknown number of components via a new posterior split-merge MCMC algorithm. Our strategy for splitting is based on data in which allocation probabilities are calculated based on posterior distribution from the previously allocated observations. This procedure is easy to be implemented and determines a quick split proposal. The acceptance probability for split-merge movements are calculated according to metropolised Carlin and Chib’s procedure. The performance of the proposed algorithm is verified using artificial datasets as well as two real datasets. The first real data set is the benchmark galaxy data, while the second is the publicly available data set on Escherichia coli bacterium.

MSC:

62F15 Bayesian inference
62F10 Point estimation

Software:

Silhouettes
Full Text: DOI

References:

[1] Arfin, S. M.; Long, A. D.; Ito, E. T.; Tolleri, L.; Riehle, M. M.; Paegle, E. S.; Hatfield, G. W., Global gene expression profiling in Escherichia Coli k12, J. Biol. Chem., 275, 29672-29684 (1995)
[2] Bhattacharya, S., Gibbs sampling based bayesian analysis of mixtures with unknown number of components, Sankhyã, 70, 133-155 (2008) · Zbl 1192.62073
[3] Carlin, B. P.; Chib, S., Bayesian model choice via markov chain monte carlo methods, J. R. Stat. Soc. B, 57, 3, 473-484 (1995) · Zbl 0827.62027
[5] Chen, G.; Jaradat, S. A.; Banerjee, N.; Tanaka, T. S.; Ko, M. S.H.; Zhang, M. Q., Evaluation and comparison of clustering algorithms in analyzing es cell gene expression data, Stat. Sin., 12, 241-262 (2002) · Zbl 0997.62086
[6] Chib, S., Marginal likelihood from the gibbs output, J. Am. Stat. Assoc., 90, 1313-1321 (1995) · Zbl 0868.62027
[7] Chib, S.; Greenberg, E., Understanding the Metropolis-Hastings algorithm, Amer. Stat., 49, 327-335 (1995)
[8] Dellaportas, P.; Forster, J. J.; Ntzoufras, I., On Bayesian model and variable selection using MCMC, Stat. Comput., 12, 27-36 (2002) · Zbl 1247.62086
[9] Dellapotas, P.; Papgeorgiou, I., Multivariate mixtures of normals with unknown number of components, Stat. Comput., 16, 57-68 (2006)
[10] Diebolt, J.; Robert, C., Discussion of Bayesian computations via the Gibbs sampler by A.F.M. Smith and G. Roberts, J. R. Stat. Soc. B, 55, 71-72 (1993)
[11] Diebolt, J.; Robert, C., Estimation of finite mixture distributions by Bayesian sampling, J. R. Stat. Soc. B, 56, 363-375 (1994) · Zbl 0796.62028
[12] Eddy, W. F., A new convex hull algorithm for planar sets, ACM Trans. Math. Softw., 3, 398-403 (1997) · Zbl 0374.68036
[13] Escobar, M. D.; West, M., Bayesain density estimation and inference using mixtures, J. Am. Stat. Assoc., 90, 577-588 (1995) · Zbl 0826.62021
[14] Ferguson, S. T., A bayesian analysis of some nonparametric problems, Ann. Stat., 2, 209-230 (1973) · Zbl 0255.62037
[15] Frühwirth-Schnatter, S., Markov Cahin Monte Carlo estimation of classical and dynamic switching and mixture models, J. Am. Stat. Assoc., 96, 194-209 (2001) · Zbl 1015.62022
[16] Godsill, S. J., On the relationship between markov chain monte carlo methods for model uncertainty, J. Comput. Graphical Stat., 10, 230-428 (2001)
[17] Jain, S.; Neal, R., A split-merge markov chain monte carlo procedure for the Dirichlet process mixture models, J. Comput. Graphical Stat., 13, 1, 158-182 (2004)
[18] Jain, S.; Neal, R., Splitting and merging components of a nonconjugated Dirichlet process mixture model, Bayesian Anal., 3, 445-472 (2007) · Zbl 1331.62145
[19] Jasra, A.; Holmes, C. C.; Stephens, D. A., Markov chain monte carlo methods and the label switching problem in bayesian mixture modeling, Stat. Sci., 20, 50-67 (2005) · Zbl 1100.62032
[20] Mengersen, K.; Robert, C. P., Testing for mixtures: a Bayesian entropic approach, (Bernardo, J. M.; Berger, J. O.; Dawid, A. P.; Smith, A. F.M., Bayesian Statistics, vol. 5 (1996), Oxford University Press), 255-726
[21] Nobile, A.; Fearnside, A. T., Bayesian finite mixtures with an unknown number of components: the allocation sampler, Stat. Comput., 17, 147-162 (2007)
[22] Richardson, S.; Green, P. J., On bayesian analysis of mixture with unknown number of components, J. R. Stat. Soc., 59, 731-792 (1997) · Zbl 0891.62020
[24] Rousseeuw, P. J., Silhouettes: a graphical aid to the interpretation and validation of cluster analysis, J. Comput. Appl. Math., 20, 53-65 (1987) · Zbl 0636.62059
[25] Stephens, M., Bayesian analysis of mixture models with an unknown number of components-an alternative to reversible jump method, Ann. Stat., 34, 187-220 (2000)
[26] Stephens, M., Dealing with label switching in mixture models, J. R. Stat. Soc. B, 62, 795-809 (2000) · Zbl 0957.62020
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.