×

Principled selection of hyperparameters in the latent Dirichlet allocation model. (English) Zbl 1471.62283

Summary: Latent Dirichlet Allocation (LDA) is a well known topic model that is often used to make inference regarding the properties of collections of text documents. LDA is a hierarchical Bayesian model, and involves a prior distribution on a set of latent topic variables. The prior is indexed by certain hyperparameters, and even though these have a large impact on inference, they are usually chosen either in an ad-hoc manner, or by applying an algorithm whose theoretical basis has not been firmly established. We present a method, based on a combination of Markov chain Monte Carlo and importance sampling, for estimating the maximum likelihood estimate of the hyperparameters. The method may be viewed as a computational scheme for implementation of an empirical Bayes analysis. It comes with theoretical guarantees, and a key feature of our approach is that we provide theoretically-valid error margins for our estimates. Experiments on both synthetic and real data show good performance of our methodology.

MSC:

62F15 Bayesian inference
60J22 Computational methods in Markov chains
68T05 Learning and adaptive systems in artificial intelligence

Software:

mcmcse; MALLET; gensim

References:

[1] Christophe Andrieu, Nando De Freitas, Arnaud Doucet, and Michael I Jordan. An introduction to MCMC for machine learning. Machine Learning, 50:5–43, 2003. · Zbl 1033.68081
[2] Arthur Asuncion, Max Welling, Padhraic Smyth, and Yee Whye Teh. On smoothing and inference for topic models. In Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, UAI ’09, pages 27–34, Arlington, Virginia, United States, 2009. AUAI Press.
[3] James Bergstra and Yoshua Bengio. Random search for hyper-parameter optimization. Journal of Machine Learning Research, 13:281–305, 2012. · Zbl 1283.68282
[4] David M. Blei, Andrew Y. Ng, and Michael I. Jordan. Latent Dirichlet allocation. Journal of Machine Learning Research, 3:993–1022, 2003. · Zbl 1112.68379
[5] Zhe Chen. Inference for the Number of Topics in the Latent Dirichlet Allocation Model via Bayesian Mixture Modelling. PhD thesis, University of Florida, 2015.
[6] Zhe Chen and Hani Doss. Inference for the number of topics in the latent Dirichlet allocation model via Bayesian mixture modelling. Technical report, Department of Statistics, University of Florida, 2017.
[7] Persi Diaconis, Kshitij Khare, and Laurent Saloff-Coste. Gibbs sampling, exponential families and orthogonal polynomials (with discussion). Statistical Science, 23:151–178, 2008. · Zbl 1327.62058
[8] Hani Doss and Clint P. George. Theoretical and empirical evaluation of a grouped Gibbs sampler for parallel computation in the LDA model. Technical report, Department of Statistics, University of Florida, 2017. 36
[9] Hani Doss and Yeonhee Park. An MCMC approach to empirical Bayes inference and Bayesian sensitivity analysis via empirical processes. Annals of Statistics (to appear), 2018. · Zbl 1403.62016
[10] James M. Flegal, Murali Haran, and Galin L. Jones. Markov chain Monte Carlo: Can we trust the third significant figure? Statistical Science, 23:250–260, 2008. · Zbl 1327.62017
[11] James M. Flegal, John Hughes, and Dootika Vats. mcmcse: Monte Carlo Standard Errors for MCMC. Riverside, CA and Minneapolis, MN, 2016. R package version 1.2-1.
[12] Gersende Fort and Eric Moulines. Convergence of the Monte Carlo expectation maximization for curved exponential families. The Annals of Statistics, 31:1220–1259, 2003. · Zbl 1043.62015
[13] David Freedman. Wald Lecture: On the Bernstein-von Mises theorem with infinite-dimensional parameters. The Annals of Statistics, 27:1119–1141, 1999. · Zbl 0957.62002
[14] Claudio Fuentes, Vikneshwaran Gopal, George Casella, Clint P. George, Taylor C. Glenn, Joseph N. Wilson, and Paul D. Gader. Product partition models for Dirichlet allocation. Technical report, Department of Computer and Information Science and Engineering, University of Florida, 2011.
[15] Clint P. George. Latent Dirichlet Allocation: Hyperparameter Selection and Applications to Electronic Discovery. PhD thesis, University of Florida, 2015.
[16] Clint P. George and Hani Doss. Supplement to “Principled selection of hyperparameters in the latent Dirichlet allocation model,” 2018.
[17] Edward I. George and Dean P. Foster.Calibration and empirical Bayes variable selection. Biometrika, 87:731–747, 2000. · Zbl 1029.62008
[18] Charles J. Geyer and Elizabeth A. Thompson. Annealing Markov chain Monte Carlo with applications to ancestral inference. Journal of the American Statistical Association, 90:909–920, 1995. · Zbl 0850.62834
[19] Thomas L. Griffiths and Mark Steyvers. Finding scientific topics. Proceedings of the National Academy of Sciences, 101:5228–5235, 2004.
[20] W. K. Hastings.Monte Carlo sampling methods using Markov chains and their applications. Biometrika, 57:97–109, 1970. · Zbl 0219.65008
[21] James P. Hobert and George Casella. The effect of improper priors on Gibbs sampling in hierarchical linear mixed models. Journal of the American Statistical Association, 91(436):1461–1473, 1996. · Zbl 0882.62020
[22] Galin L. Jones, Murali Haran, Brian S. Caffo, and Ronald Neath. Fixed-width output analysis for Markov chain Monte Carlo. Journal of the American Statistical Association, 101:1537–1547, 2006. · Zbl 1171.62316
[23] J. S. Liu, W. H. Wong, and A. Kong. Covariance structure of the Gibbs sampler with applications to the comparisons of estimators and augmentation schemes. Biometrika, 81:27–40, 1994. · Zbl 0811.62080
[24] Enzo Marinari and Giorgio Parisi. Simulated tempering: A new Monte Carlo scheme. Europhysics Letters, 19:451–458, 1992. 37
[25] Andrew Kachites McCallum. MALLET: A machine learning for language toolkit. http:// mallet.cs.umass.edu, 2002.
[26] Thomas P. Minka.Estimating a Dirichlet distribution, 2003.URL http://research. microsoft.com/˜minka/papers/dirichlet/.
[27] David Newman, Arthur Asuncion, Padhraic Smyth, and Max Welling. Distributed algorithms for topic models. Journal of Machine Learning Research, 10:1801–1828, 2009. · Zbl 1235.68324
[28] Radim ˇReh˚uˇrek and Petr Sojka. Software framework for topic modelling with large corpora. In Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, pages 45– 50, Valletta, Malta, 2010. ELRA.
[29] Christian P. Robert. The Bayesian Choice: From Decision-Theoretic Foundations to Computational Implementation. Springer-Verlag, New York, 2001. · Zbl 0980.62005
[30] Y. W. Teh, M. I. Jordan, M. J. Beal, and D. M. Blei. Hierarchical Dirichlet processes. Journal of the American Statistical Association, 101:1566–1581, 2006. · Zbl 1171.62349
[31] Martin J Wainwright and Michael I Jordan. Graphical models, exponential families, and variational inference. Foundations and Trends in Machine Learning, 1(1–2):1–305, 2008.R · Zbl 1193.62107
[32] Hanna M. Wallach. Topic modeling: Beyond bag-of-words. In Proceedings of the 23rd International Conference on Machine Learning, ICML ’06, pages 977–984, New York, NY, USA, 2006. ACM.
[33] Hanna M. Wallach. Structured Topic Models for Language. PhD thesis, University of Cambridge, 2008.
[34] Hanna M Wallach, Iain Murray, Ruslan Salakhutdinov, and David Mimno. Evaluation methods for topic models. In Proceedings of the 26th Annual International Conference on Machine Learning, pages 1105–1112. ACM, 2009.
[35] Greg C. G. Wei and Martin A. Tanner. A Monte Carlo implementation of the EM algorithm and the poor man’s data augmentation algorithms. Journal of the American Statistical Association, 85: 699–704, 1990.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.