×

A tutorial on Bayesian nonparametric models. (English) Zbl 1237.62062

Summary: A key problem in statistical modeling is model selection, that is, how to choose a model at an appropriate level of complexity. This problem appears in many settings, most prominently in choosing the number of clusters in mixture models or the number of factors in factor analysis. We describe Bayesian nonparametric methods, a class of methods that side-steps this issue by allowing the data to determine the complexity of the model. This tutorial is a high-level introduction to Bayesian nonparametric methods and contains several examples of their applications.

MSC:

62G99 Nonparametric inference
62F15 Bayesian inference
62H30 Classification and discrimination; cluster analysis (statistical aspects)
62H25 Factor analysis and principal components; correspondence analysis
65C60 Computational problems in statistics (MSC2010)

Software:

PRMLT; BayesDA

References:

[1] Aldous, D., Exchangeability and related topics, (École d’été de probabilités de Saint-Flour XIII (1985), Springer: Springer Berlin), 1-198 · Zbl 0562.60042
[2] Anderson, J., The adaptive nature of human categorization, Psychological Review, 98, 409-429 (1991)
[3] Andrieu, C.; De Freitas, N.; Doucet, A.; Jordan, M., An introduction to MCMC for machine learning, Machine Learning, 50, 5-43 (2003) · Zbl 1033.68081
[4] Antoniak, C., Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems, The Annals of Statistics, 2, 1152-1174 (1974) · Zbl 0335.60034
[5] Attias, H., A variational Bayesian framework for graphical models, Advances in Neural Information Processing Systems, 12, 209-215 (2000)
[6] Austerweil, J.; Griffiths, T., Analyzing human feature learning as nonparametric Bayesian inference, Advances in Neural Information Processing Systems, 21, 386-492 (2009)
[7] Austerweil, J.; Griffiths, T. L., Analyzing human feature learning as nonparametric Bayesian inference, (Koller, D.; Schuurmans, D.; Bengio, Y.; Bottou, L., Advances in Neural Information Processing Systems, Vol. 21 (2009)), 97-104
[8] Beal, M.; Ghahramani, Z.; Rasmussen, C., The infinite hidden Markov model, Advances in Neural Information Processing Systems, 14, 577-584 (2002)
[9] Bernardo, J.; Smith, A., Bayesian theory (1994), Wiley: Wiley Chichester · Zbl 0796.62002
[10] Bishop, C., Pattern recognition and machine learning (2006), Springer: Springer New York · Zbl 1107.68072
[11] Blei, D., & Frazier, P. (2010). Distance dependent Chinese restaurant processes. In International conference on machine learning; Blei, D., & Frazier, P. (2010). Distance dependent Chinese restaurant processes. In International conference on machine learning · Zbl 1280.68157
[12] Blei, D.; Jordan, M., Variational inference for Dirichlet process mixtures, Bayesian Analysis, 1, 1, 121-144 (2006) · Zbl 1331.62259
[13] Browne, M., An overview of analytic rotation in exploratory factor analysis, Multivariate Behavioral Research, 36, 1, 111-150 (2001)
[14] Claeskens, G.; Hjort, N., Model selection and model averaging (2008), Cambridge University Press · Zbl 1166.62001
[15] Comrey, A.; Lee, H., A first course in factor analysis (1992), Lawrence Erlbaum
[16] De Finetti, B., Funzione caratteristica di un fenomeno aleatorio, Atti della R. Academia Nazionale dei Lincei, Serie 6. Memorie, Classe di Scienze Fisiche, Mathematice e Naturale (4) (1931) · JFM 57.0610.01
[17] Doshi-Velez, F., & Ghahramani, Z. (2009). Correlated non-parametric latent feature models. In International conference on uncertainty in artificial intelligence; Doshi-Velez, F., & Ghahramani, Z. (2009). Correlated non-parametric latent feature models. In International conference on uncertainty in artificial intelligence
[18] Doshi-Velez, F., Miller, K., Berkeley, C., Van Gael, J., & Teh, Y. (2009). Variational inference for the Indian buffet process. In Proceedings of the international conference on artificial intelligence and statistics; Doshi-Velez, F., Miller, K., Berkeley, C., Van Gael, J., & Teh, Y. (2009). Variational inference for the Indian buffet process. In Proceedings of the international conference on artificial intelligence and statistics
[19] Duan, J.; Guindani, M.; Gelfand, A., Generalized spatial Dirichlet process models, Biometrika, 94, 4, 809 (2007) · Zbl 1156.62064
[20] Escobar, M.; West, M., Bayesian density estimation and inference using mixtures, Journal of the American Statistical Association, 90, 430 (1995) · Zbl 0826.62021
[21] Fearnhead, P., Particle filters for mixture models with an unknown number of components, Statistics and Computing, 14, 1, 11-21 (2004)
[22] Ferguson, T., A Bayesian analysis of some nonparametric problems, The Annals of Statistics, 1, 2, 209-230 (1973) · Zbl 0255.62037
[23] Fox, E.; Sudderth, E.; Jordan, M.; Willsky, A., Nonparametric Bayesian learning of switching linear dynamical systems, Advances in Neural Information Processing Systems, 21 (2008)
[24] Fox, E.; Sudderth, E.; Jordan, M.; Willsky, A., Sharing features among dynamical systems with beta processes, (Advances in Neural Information Processing Systems, 22 (2009))
[25] Gelfand, A.; Kottas, A., A computational approach for full nonparametric Bayesian inference under Dirichlet process mixture models, Journal of Computational and Graphical Statistics, 11, 289-305 (2002)
[26] Gelfand, A.; Kottas, A.; MacEachern, S., Bayesian nonparametric spatial modeling with Dirichlet process mixing, Journal of the American Statistical Association, 100, 471, 1021-1035 (2005) · Zbl 1117.62342
[27] Gelman, A.; Carlin, J.; Stern, H.; Rubin, D., Bayesian data analysis (2004) · Zbl 1039.62018
[28] Gershman, S.; Blei, D.; Niv, Y., Context, learning, and extinction, Psychological Review, 117, 1, 197 (2010)
[29] Goldwater, S.; Griffiths, T.; Johnson, M., A Bayesian framework for word segmentation: exploring the effects of context, Cognition, 112, 1, 21-54 (2009)
[30] Gould, S., The mismeasure of man (1981), Norton: Norton New York
[31] Griffin, J.; Steel, M., Order-based dependent Dirichlet processes, Journal of the American statistical Association, 101, 473, 179-194 (2006) · Zbl 1118.62360
[32] Griffiths, T., Canini, K., Sanborn, A., & Navarro, D. (2007). Unifying rational models of categorization via the hierarchical Dirichlet process. In Proceedings of the 29th annual conference of the cognitive science society; Griffiths, T., Canini, K., Sanborn, A., & Navarro, D. (2007). Unifying rational models of categorization via the hierarchical Dirichlet process. In Proceedings of the 29th annual conference of the cognitive science society
[33] Griffiths, T.; Ghahramani, Z., Infinite latent feature models and the Indian buffet process, Advances in Neural Information Processing Systems, 18, 475 (2005)
[34] Griffiths, T.; Ghahramani, Z., The Indian buffet process: an introduction and review, Journal of Machine Learning Research, 12, 1185-1224 (2011) · Zbl 1280.62038
[35] Grünwald, P., The minimum description length principle (2007), The MIT Press
[36] Hannah, L., Blei, D., & Powell, W. (2010). Dirichlet process mixtures of generalized linear models. In Proceedings of the international conference on artificial intelligence and statistics. Vol. 13; Hannah, L., Blei, D., & Powell, W. (2010). Dirichlet process mixtures of generalized linear models. In Proceedings of the international conference on artificial intelligence and statistics. Vol. 13 · Zbl 1280.62031
[37] Heller, K.; Sanborn, A.; Chater, N., Hierarchical learning of dimensional biases in human categorization, (Bengio, Y.; Schuurmans, D.; Lafferty, J.; Williams, C. K.I.; Culotta, A., Advances in neural information processing systems, Vol. 22 (2009)), 727-735
[38] (Hjort, N.; Holmes, C.; Müller, P.; Walker, S. G., Bayesian nonparametrics: principles and practice (2010), Cambridge University Press) · Zbl 1241.62005
[39] Ishwaran, H.; James, L., Gibbs sampling methods for stick-breaking priors, Journal of the American Statistical Association, 96, 453, 161-173 (2001) · Zbl 1014.62006
[40] Ishwaran, H.; Rao, J., Spike and slab variable selection: frequentist and Bayesian strategies, Annals of Statistics, 730-773 (2005) · Zbl 1068.62079
[41] Ishwaran, H.; Zarepour, M., Exact and approximate sum representations for the Dirichlet process, Canadian Journal of Statistics, 30, 2, 269-283 (2002) · Zbl 1035.60048
[42] Jain, S.; Neal, R., A split-merge Markov chain Monte Carlo procedure for the Dirichlet process mixture model, Journal of Computational and Graphical Statistics, 13, 1, 158-182 (2004)
[43] Johnson, M.; Griffiths, T.; Goldwater, S., Adaptor grammars: a framework for specifying compositional nonparametric Bayesian models, Advances in Neural Information Processing Systems, 19 (2007)
[44] Jordan, M.; Ghahramani, Z.; Jaakkola, T.; Saul, L., An introduction to variational methods for graphical models, Machine Learning, 37, 2, 183-233 (1999) · Zbl 0945.68164
[45] Kane, M.; Hambrick, D.; Tuholski, S.; Wilhelm, O.; Payne, T.; Engle, R., The generality of working memory capacity: a latent-variable approach to verbal and visuospatial memory span and reasoning, Journal of Experimental Psychology: General, 133, 2, 189 (2004)
[46] Kass, R.; Raftery, A., Bayes factors, Journal of the American Statistical Association, 90, 773-795 (1995) · Zbl 0846.62028
[47] Kemp, C.; Tenenbaum, J.; Griffiths, T.; Yamada, T.; Ueda, N., Learning systems of concepts with an infinite relational model, (Proceedings of the national conference on artificial intelligence, Vol. 21 (2006), AAAI Press, MIT Press: AAAI Press, MIT Press Menlo Park, CA, Cambridge, MA, London), 381, 1999
[48] Kemp, C.; Tenenbaum, J.; Niyogi, S.; Griffiths, T., A probabilistic model of theory formation, Cognition, 114, 2, 165-196 (2010)
[49] Knowles, D.; Ghahramani, Z., Nonparametric Bayesian sparse factor models with application to gene expression modelling, Annals of Applied Statistics, 5, 1534-1552 (2011) · Zbl 1223.62013
[50] Kurihara, K., Welling, M., & Teh, Y. (2007). Collapsed variational Dirichlet process mixture models. In Proceedings of the international joint conference on artificial intelligence. Vol. 20; Kurihara, K., Welling, M., & Teh, Y. (2007). Collapsed variational Dirichlet process mixture models. In Proceedings of the international joint conference on artificial intelligence. Vol. 20
[51] Lee, M., How cognitive modeling can benefit from hierarchical Bayesian models, Journal of Mathematical Psychology, 55, 1, 1-7 (2010) · Zbl 1208.91123
[52] Liang, P., Petrov, S., Jordan, M., & Klein, D. (2007). The infinite PCFG using hierarchical Dirichlet processes. In Proceedings of the 2007 joint conference on empirical methods in natural language processing and computational natural language learning; Liang, P., Petrov, S., Jordan, M., & Klein, D. (2007). The infinite PCFG using hierarchical Dirichlet processes. In Proceedings of the 2007 joint conference on empirical methods in natural language processing and computational natural language learning
[53] Luce, R., Response times: their role in inferring elementary mental organization (1986), Oxford University Press
[54] MacEachern, S. (1999). Dependent nonparametric processes. In ASA proceedings of the section on Bayesian statistical science; MacEachern, S. (1999). Dependent nonparametric processes. In ASA proceedings of the section on Bayesian statistical science
[55] McLachlan, G.; Peel, D., Finite mixture models, Vol. 299 (2000), Wiley-Interscience · Zbl 0963.62061
[56] Meeds, E.; Osindero, S., An alternative infinite mixture of Gaussian process experts, Advances in Neural Information Processing Systems, 18, 883 (2006)
[57] Miller, K., Griffiths, T., & Jordan, M. (2008). The phylogenetic indian buffet process: a non-exchangeable nonparametric prior for latent features. In International conference on uncertainty in artificial intelligence; Miller, K., Griffiths, T., & Jordan, M. (2008). The phylogenetic indian buffet process: a non-exchangeable nonparametric prior for latent features. In International conference on uncertainty in artificial intelligence
[58] Miller, K.; Griffiths, T.; Jordan, M., Nonparametric latent feature models for link prediction, (Advances in neural information processing systems (2009)), 1276-1284
[59] Mitchell, T.; Beauchamp, J., Bayesian variable selection in linear regression, Journal of the American Statistical Association, 83, 404, 1023-1032 (1988) · Zbl 0673.62051
[60] Myung, I.; Forster, M.; Browne, M., Special issue on model selection, Journal of Mathematical Psychology, 44, 1-2 (2000) · Zbl 0951.00029
[61] Navarro, D.; Griffiths, T., Latent features in similarity judgments: a nonparametric Bayesian approach, Neural Computation, 20, 11, 2597-2628 (2008) · Zbl 1152.91765
[62] Navarro, D.; Griffiths, T.; Steyvers, M.; Lee, M., Modeling individual differences using Dirichlet processes, Journal of Mathematical Psychology, 50, 2, 101-122 (2006) · Zbl 1138.91594
[63] Neal, R. (1992). Bayesian mixture modeling. In Maximum entropy and Bayesian methods: proceedings of the 11th international workshop on maximum entropy and Bayesian methods of statistical analysis; Neal, R. (1992). Bayesian mixture modeling. In Maximum entropy and Bayesian methods: proceedings of the 11th international workshop on maximum entropy and Bayesian methods of statistical analysis · Zbl 0829.62033
[64] Neal, R., Markov chain sampling methods for Dirichlet process mixture models, Journal of Computational and Graphical Statistics, 249-265 (2000)
[65] Paisley, J.; Carin, L., Hidden Markov models with stick-breaking priors, IEEE Transactions on Signal Processing, 57, 10, 3905-3917 (2009) · Zbl 1391.94357
[66] Paisley, J., Zaas, A., Woods, C., Ginsburg, G., & Carin, L. (2010). Nonparametric factor analysis with beta process priors. In Proceedings of the international conference on machine learning; Paisley, J., Zaas, A., Woods, C., Ginsburg, G., & Carin, L. (2010). Nonparametric factor analysis with beta process priors. In Proceedings of the international conference on machine learning
[67] Pearson, K., On lines and planes of closest fit to systems of points in space, Philosophical Magazine Series (6), 2, 11, 559-572 (1901) · JFM 32.0246.07
[68] Pitman, J. (2002). Combinatorial stochastic processes. Techincal report 621; Pitman, J. (2002). Combinatorial stochastic processes. Techincal report 621
[69] Rasmussen, C., The infinite Gaussian mixture model, Advances in Neural Information Processing Systems, 12, 554-560 (2000)
[70] Rasmussen, C.; Ghahramani, Z., Infinite mixtures of Gaussian process experts, (Advances in neural information processing systems, Vol. 14 (2002), MIT Press), 881-888
[71] Rasmussen, C.; Williams, C., Gaussian processes for machine learning (2006), Springer: Springer New York · Zbl 1177.68165
[72] Ratcliff, R.; Rouder, J., Modeling response times for two-choice decisions, Psychological Science, 9, 5, 347 (1998)
[73] Ratcliff, R.; Tuerlinckx, F., Estimating parameters of the diffusion model: approaches to dealing with contaminant reaction times and parameter variability, Psychonomic Bulletin & Review, 9, 3, 438-481 (2002)
[74] Sanborn, A.; Griffiths, T.; Navarro, D., Rational approximations to rational models: alternative algorithms for category learning, Psychological Review, 117, 4, 1144 (2010)
[75] Sethuraman, J., A constructive definition of Dirichlet priors, Statistica Sinica, 4, 2, 639-650 (1994) · Zbl 0823.62007
[76] Shahbaba, B.; Neal, R., Nonlinear models using Dirichlet process mixtures, The Journal of Machine Learning Research, 10, 1829-1850 (2009) · Zbl 1235.62069
[77] Simen, P.; Contreras, D.; Buck, C.; Hu, P.; Holmes, P.; Cohen, J., Reward rate optimization in two-alternative decision making: empirical tests of theoretical predictions, Journal of Experimental Psychology: Human Perception and Performance, 35, 6, 1865 (2009)
[78] Spearman, C., General intelligence, objectively determined and measured, The American Journal of Psychology, 201-292 (1904)
[79] Sudderth, E.; Jordan, M., Shared segmentation of natural scenes using dependent Pitman-Yor processes, Advances in Neural Information Processing Systems, 21, 585-1592 (2009)
[80] Teh, Y., Görür, D., & Ghahramani, Z. (2007). Stick-breaking construction for the Indian buffet process. In Proceedings of the international conference on artificial intelligence and statistics. Vol. 11; Teh, Y., Görür, D., & Ghahramani, Z. (2007). Stick-breaking construction for the Indian buffet process. In Proceedings of the international conference on artificial intelligence and statistics. Vol. 11
[81] Teh, Y.; Jordan, M.; Beal, M.; Blei, D., Hierarchical Dirichlet processes, Journal of the American Statistical Association, 101, 476, 1566-1581 (2006) · Zbl 1171.62349
[82] Thibaux, R., & Jordan, M. (2007). Hierarchical beta processes and the indian buffet process. In Proceedings of the international conference on artificial intelligence and statistics; Thibaux, R., & Jordan, M. (2007). Hierarchical beta processes and the indian buffet process. In Proceedings of the international conference on artificial intelligence and statistics
[83] Thurstone, L., Multiple factor analysis, Psychological Review, 38, 5, 406 (1931)
[84] Thurstone, L., Multiple factor analysis (1947), University of Chicago Press: University of Chicago Press Chicago · Zbl 0029.22203
[85] Vanpaemel, W., Prior sensitivity in theory testing: an apologia for the Bayes factor, Journal of Mathematical Psychology, 54, 491-498 (2010) · Zbl 1203.91265
[86] Wagenmakers, E.; van der Maas, H.; Dolan, C.; Grasman, R., Ez does it! extensions of the ez-diffusion model, Psychonomic Bulletin & Review, 15, 6, 1229-1235 (2008)
[87] Wagenmakers, E.; Waldorp, L., Editor’s introduction, Journal of Mathematical Psychology, 50, 2, 99-100 (2006)
[88] Williamson, S., Orbanz, P., & Ghahramani, Z. (2010). Dependent Indian buffet processes. In International conference on artificial intelligence and statistics; Williamson, S., Orbanz, P., & Ghahramani, Z. (2010). Dependent Indian buffet processes. In International conference on artificial intelligence and statistics
[89] Wood, F.; Griffiths, T., Particle filtering for nonparametric Bayesian matrix factorization, Advances in Neural Information Processing Systems, 19, 1513 (2007)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.