×

Variational approximations in Bayesian model selection for finite mixture distributions. (English) Zbl 1445.62050

Summary: Variational methods, which have become popular in the neural computing/machine learning literature, are applied to the Bayesian analysis of mixtures of Gaussian distributions. It is also shown how the deviance information criterion, (DIC), can be extended to these types of model by exploiting the use of variational approximations. The use of variational methods for model selection and the calculation of a DIC are illustrated with real and simulated data. The variational approach allows the simultaneous estimation of the component parameters and the model complexity. It is found that initial selection of a large number of components results in superfluous components being eliminated as the method converges to a solution. This corresponds to an automatic choice of model complexity. The appropriateness of this is reflected in the DIC values.

MSC:

62F15 Bayesian inference
62H30 Classification and discrimination; cluster analysis (statistical aspects)

References:

[1] Andrieu, C.; de Freitas, N.; Doucet, A.; Jordan, M. I., An introduction to MCMC for machine learning, Mach. Learning, 50, 5-43 (2003) · Zbl 1033.68081
[2] Attias, H., 1999. Inferring parameters and structure of latent variable models by variational Bayes. In: Proceedings of Conference on Uncertainty in Artificial Intelligence.; Attias, H., 1999. Inferring parameters and structure of latent variable models by variational Bayes. In: Proceedings of Conference on Uncertainty in Artificial Intelligence.
[3] Berg, A.; Meyer, R.; Yu, J., Deviance information criterion for comparing stochastic volatility models, J. Bus. Econ. Statist., 22, 107-119 (2004)
[4] Bishop, C.M., Winn, J., 2003. Structural variational distributions in VIBES, in: Bishop, C.M., Frey, B. (Eds.), Proceedings of Artificial Intelligence, Florida.; Bishop, C.M., Winn, J., 2003. Structural variational distributions in VIBES, in: Bishop, C.M., Frey, B. (Eds.), Proceedings of Artificial Intelligence, Florida.
[5] Celeux, G., Forbes, F., Robert, C., Titterington, D.M., 2006. Deviance information criteria for missing data models, Bayesian Anal., 1, to appear.; Celeux, G., Forbes, F., Robert, C., Titterington, D.M., 2006. Deviance information criteria for missing data models, Bayesian Anal., 1, to appear. · Zbl 1331.62329
[6] Corduneanu, A.; Bishop, C. M., Variational Bayesian model selection for mixture distributions, (Jaakkola, T.; Richardson, T., Artificial Intelligence and Statistics (2001), Morgan Kaufmann: Morgan Kaufmann Los Altos, CA), 27-34
[7] Dempster, A.P., 1974. The direct use of likelihood for significance testing, in: Proceedings of Conference on Foundational Questions in Statistical Inference, University of Aarhus, pp. 335-352.; Dempster, A.P., 1974. The direct use of likelihood for significance testing, in: Proceedings of Conference on Foundational Questions in Statistical Inference, University of Aarhus, pp. 335-352. · Zbl 0367.62004
[8] Doucet, A., de Freitas, N., Gordon, N.J. (Eds.), 2001. Sequential Monte Carlo Methods in Practice, Springer, Berlin.; Doucet, A., de Freitas, N., Gordon, N.J. (Eds.), 2001. Sequential Monte Carlo Methods in Practice, Springer, Berlin. · Zbl 0967.00022
[9] Gelfand, A. E.; Smith, A. F.M., Sampling-based approaches to calculating marginal densities, J. Amer. Statist. Assoc., 85, 398-409 (1990) · Zbl 0702.62020
[10] Geyer, C., Practical Markov chain Monte Carlo (with discussion), Statist. Sci., 7, 473-483 (1992)
[11] Ghahramani, Z.; Beal, M., Propagation algorithms for variational learning, (Touretzky, D. S.; Mozer, M. C.; Hasselmo, M. E., Advances in Neural Information Processing, vol. 13 (2001), MIT Press: MIT Press Cambridge, MA)
[12] Gilks, W. R.; Best, N. G.; Tan, K. K.C., Adaptive rejection Metropolis sampling within Gibbs sampling, J. Appl. Statist., 44, 455-472 (1995) · Zbl 0893.62110
[13] Gilks, W.R., Richardson, S., Spiegelhalter, D.J. (Eds.), 1996. Markov Chain Monte Carlo in Practice, Chapman & Hall, London.; Gilks, W.R., Richardson, S., Spiegelhalter, D.J. (Eds.), 1996. Markov Chain Monte Carlo in Practice, Chapman & Hall, London. · Zbl 0832.00018
[14] Green, P. J., Reversible jump Markov chain Monte Carlo computation and Bayesian model determination, Biometrika, 82, 711-732 (1995) · Zbl 0861.62023
[15] Green, P. J.; Richardson, S., Hidden Markov models and disease mapping, J. Amer. Statist. Assoc., 97, 1055-1070 (2002) · Zbl 1046.62117
[16] Han, C.; Carlin, B. P., Markov chain Monte Carlo methods for computing Bayes factors: a comparative review, J. Amer. Statist. Assoc., 96, 1122-1132 (2001)
[17] Jordan, M. I., Graphical models, Statist. Sci., 19, 140-155 (2004) · Zbl 1057.62001
[18] Jordan, M. I.; Ghahramani, Z.; Jaakkola, T. S.; Saul, L. K., An introduction to variational methods for graphical models, Mach. Learning, 37, 183-233 (1999) · Zbl 0945.68164
[19] McGrory, C.A., Titterington, D.M., 2006. Bayesian analysis of hidden Markov models using variational approximations. Submitted for publication.; McGrory, C.A., Titterington, D.M., 2006. Bayesian analysis of hidden Markov models using variational approximations. Submitted for publication. · Zbl 1337.62015
[20] Neal, R.M., 1996. Bayesian Learning for Neural Networks, Lecture Notes in Statistics, vol. 118, Springer, New York.; Neal, R.M., 1996. Bayesian Learning for Neural Networks, Lecture Notes in Statistics, vol. 118, Springer, New York. · Zbl 0888.62021
[21] Postman, M.; Huchra, J. P.; Geller, M. J., Probes of large-scale structure in the Corona Borealis region, Astronomical J., 92, 1238-1247 (1986)
[22] Richardson, S.; Green, P. J., On Bayesian analysis of mixtures with an unknown number of components (with discussion), J. Roy. Statist. Soc. Ser. B, 59, 731-792 (1997) · Zbl 0891.62020
[23] Robert, C. P.; Rydén, T.; Titterington, D. M., Bayesian inference in hidden Markov models through the reversible jump Markov chain Monte Carlo method, J. Roy. Statist. Soc. Ser. B, 62, 57-75 (2000) · Zbl 0941.62090
[24] Roeder, K., Density estimation with confidence sets exemplified by superclusters and voids in the galaxies, J. Amer. Statist. Assoc., 85, 617-624 (1990) · Zbl 0704.62103
[25] Schwarz, G., Estimating the dimension of a model, Ann. Statist., 6, 461-464 (1978) · Zbl 0379.62005
[26] Spiegelhalter, D. J.; Best, N. G.; Carlin, B. P.; Van der Linde, A., Bayesian measures of model complexity and fit (with discussion), J. Roy. Statist. Soc. Ser. B, 64, 583-639 (2002) · Zbl 1067.62010
[27] Stephens, M., Bayesian analysis of mixture models with an unknown number of components—an alternative to reversible jump methods, Ann. Statist., 28, 40-74 (2000) · Zbl 1106.62316
[28] Tierney, L., Markov chains for exploring posterior distributions, Ann. Statist., 22, 1701-1728 (1994) · Zbl 0829.62080
[29] Titterington, D. M., Bayesian methods for neural networks and related models, Statist. Sci., 19, 128-139 (2004) · Zbl 1057.62078
[30] Titterington, D. M.; Smith, A. F.M.; Makov, U. E., Statistical Analysis of Finite Mixture Distributions (1985), Wiley: Wiley Chichester · Zbl 0646.62013
[31] Ueda, N.; Ghahramani, Z., Bayesian model search for mixture models based on optimizing variational bounds, Neural Networks, 15, 1223-1241 (2002)
[32] Wang, B., Titterington, D.M., 2006. Convergence properties of a general algorithm for calculating variational Bayesian estimates for a normal mixture model. Bayesian Anal., 1, 625-650.; Wang, B., Titterington, D.M., 2006. Convergence properties of a general algorithm for calculating variational Bayesian estimates for a normal mixture model. Bayesian Anal., 1, 625-650. · Zbl 1331.62168
[33] Waterhouse, S.; MacKay, D.; Robinson, T., Bayesian methods for mixtures of experts, (Touretzky, D. S.; etal., Advances in Neural Information Processing Systems, vol. 8 (1996), MIT Press: MIT Press Cambridge, MA)
[34] Winn, J.; Bishop, C. M., Variational message passing, J. Machine Learning Res., 6, 661-694 (2005) · Zbl 1222.68332
[35] Zhu, L.; Carlin, B. P., Comparing hierarchical models for spatio-temporally misaligned data using the deviance information criterion, Statist. Med., 19, 2265-2278 (2000)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.