×

Borrowing strengh in hierarchical Bayes: posterior concentration of the Dirichlet base measure. (English) Zbl 1360.62103

Summary: This paper studies posterior concentration behavior of the base probability measure of a Dirichlet measure, given observations associated with the sampled Dirichlet processes, as the number of observations tends to infinity. The base measure itself is endowed with another Dirichlet prior, a construction known as the hierarchical Dirichlet processes (Y. W. Teh et al. [J. Am. Stat. Assoc. 101, No. 476, 1566–1581 (2006; Zbl 1171.62349)]). Convergence rates are established in transportation distances (i.e., Wasserstein metrics) under various conditions on the geometry of the support of the true base measure. As a consequence of the theory, we demonstrate the benefit of “borrowing strength” in the inference of multiple groups of data – a powerful insight often invoked to motivate hierarchical modeling. In certain settings, the gain in efficiency due to the latent hierarchy can be dramatic, improving from a standard nonparametric rate to a parametric rate of convergence. Tools developed include transportation distances for nonparametric Bayesian hierarchies of random measures, the existence of tests for Dirichlet measures, and geometric properties of the support of Dirichlet measures.

MSC:

62F15 Bayesian inference
60B10 Convergence of probability measures
60E15 Inequalities; stochastic orderings
60G57 Random measures
62G20 Asymptotic properties of nonparametric inference

Citations:

Zbl 1171.62349

References:

[1] Barron, A., Schervish, M.J. and Wasserman, L. (1999). The consistency of posterior distributions in nonparametric problems. Ann. Statist. 27 536-561. · Zbl 0980.62039 · doi:10.1214/aos/1018031206
[2] Berger, J.O. (1993). Statistical Decision Theory and Bayesian Analysis. Springer Series in Statistics . New York: Springer. · Zbl 0572.62008
[3] Blackwell, D. and MacQueen, J.B. (1973). Ferguson distributions via Pólya urn schemes. Ann. Statist. 1 353-355. · Zbl 0276.62010 · doi:10.1214/aos/1176342372
[4] Carroll, R.J. and Hall, P. (1988). Optimal rates of convergence for deconvolving a density. J. Amer. Statist. Assoc. 83 1184-1186. · Zbl 0673.62033 · doi:10.2307/2290153
[5] Doss, H. and Sellke, T. (1982). The tails of probabilities chosen from a Dirichlet prior. Ann. Statist. 10 1302-1305. · Zbl 0515.62008 · doi:10.1214/aos/1176345996
[6] Falconer, K.J. (1986). The Geometry of Fractal Sets. Cambridge Tracts in Mathematics 85 . Cambridge: Cambridge Univ. Press. · Zbl 0587.28004
[7] Fan, J. (1991). On the optimal rates of convergence for nonparametric deconvolution problems. Ann. Statist. 19 1257-1272. · Zbl 0729.62033 · doi:10.1214/aos/1176348248
[8] Ferguson, T.S. (1973). A Bayesian analysis of some nonparametric problems. Ann. Statist. 1 209-230. · Zbl 0255.62037 · doi:10.1214/aos/1176342360
[9] Garcia, I., Molter, U. and Scotto, R. (2007). Dimension functions of Cantor sets. Proc. Amer. Math. Soc. 135 3151-3161. · Zbl 1124.28006 · doi:10.1090/S0002-9939-07-09019-3
[10] Gassiat, E. and Rousseau, J. (2014). About the posterior distribution in hidden Markov models with unknown number of states. Bernoulli 20 2039-2075. · Zbl 1302.62183 · doi:10.3150/13-BEJ550
[11] Gassiat, E. and van Handel, R. (2014). The local geometry of finite mixtures. Trans. Amer. Math. Soc. 366 1047-1072. · Zbl 1291.52004 · doi:10.1090/S0002-9947-2013-06041-2
[12] Ghosal, S. (2010). The Dirichlet process, related priors and posterior asymptotics. In Bayesian Nonparametrics. Camb. Ser. Stat. Probab. Math. 35-79. Cambridge: Cambridge Univ. Press. · doi:10.1017/CBO9780511802478.003
[13] Ghosal, S., Ghosh, J.K. and van der Vaart, A.W. (2000). Convergence rates of posterior distributions. Ann. Statist. 28 500-531. · Zbl 1105.62315 · doi:10.1214/aos/1016218228
[14] Ghosh, J.K. and Ramamoorthi, R.V. (2003). Bayesian Nonparametrics. Springer Series in Statistics . New York: Springer. · Zbl 1029.62004
[15] Giné, E. and Nickl, R. (2011). Rates on contraction for posterior distributions in \(L^{r}\)-metrics, \(1\leq r\leq\infty\). Ann. Statist. 39 2883-2911. · Zbl 1246.62095 · doi:10.1214/11-AOS924
[16] Hjort, N.L., Holmes, C., Müller, P. and Walker, S.G., eds. (2010). Bayesian Nonparametrics. Cambridge Series in Statistical and Probabilistic Mathematics 28 . Cambridge: Cambridge Univ. Press. · Zbl 1192.62080 · doi:10.1017/CBO9780511802478
[17] Korwar, R.M. and Hollander, M. (1973). Contributions to the theory of Dirichlet processes. Ann. Probab. 1 705-711. · Zbl 0264.60084 · doi:10.1214/aop/1176996898
[18] Lehmann, E.L. and Casella, G. (1998). Theory of Point Estimation , 2nd ed. Springer Texts in Statistics . New York: Springer. · Zbl 0916.62017
[19] Nguyen, X. (2013). Convergence of latent mixing measures in finite and infinite mixture models. Ann. Statist. 41 370-400. · Zbl 1347.62117 · doi:10.1214/12-AOS1065
[20] Nguyen, X. (2015). Posterior contraction of the population polytope in finite admixture models. Bernoulli 21 618-646. · Zbl 1368.62288 · doi:10.3150/13-BEJ582
[21] Nguyen, X. (2015). Supplement to “Borrowing strengh in hierarchical Bayes: Posterior concentration of the Dirichlet base measure.” . · Zbl 1368.62288 · doi:10.3150/13-BEJ582
[22] Rousseau, J. and Mengersen, K. (2011). Asymptotic behaviour of the posterior distribution in overfitted mixture models. J. R. Stat. Soc. Ser. B. Stat. Methodol. 73 689-710. · Zbl 1228.62034 · doi:10.1111/j.1467-9868.2011.00781.x
[23] Sethuraman, J. (1994). A constructive definition of Dirichlet priors. Statist. Sinica 4 639-650. · Zbl 0823.62007
[24] Shen, X. and Wasserman, L. (2001). Rates of convergence of posterior distributions. Ann. Statist. 29 687-714. · Zbl 1041.62022 · doi:10.1214/aos/1009210686
[25] Teh, Y.W. and Jordan, M.I. (2010). Hierarchical Bayesian nonparametric models with applications. In Bayesian Nonparametrics. Camb. Ser. Stat. Probab. Math. 158-207. Cambridge: Cambridge Univ. Press. · doi:10.1017/CBO9780511802478.006
[26] Teh, Y.W., Jordan, M.I., Beal, M.J. and Blei, D.M. (2006). Hierarchical Dirichlet processes. J. Amer. Statist. Assoc. 101 1566-1581. · Zbl 1171.62349 · doi:10.1198/016214506000000302
[27] van der Vaart, A.W. and van Zanten, J.H. (2008). Rates of contraction of posterior distributions based on Gaussian process priors. Ann. Statist. 36 1435-1463. · Zbl 1141.60018 · doi:10.1214/009053607000000613
[28] van der Vaart, A.W. and Wellner, J.A. (1996). Weak Convergence and Empirical Processes. Springer Series in Statistics . New York: Springer. · Zbl 0862.60002
[29] Villani, C. (2009). Optimal Transport : Old and New. Grundlehren der Mathematischen Wissenschaften 338 . Berlin: Springer. · Zbl 1156.53003 · doi:10.1007/978-3-540-71050-9
[30] Walker, S. (2004). New approaches to Bayesian consistency. Ann. Statist. 32 2028-2043. · Zbl 1056.62040 · doi:10.1214/009053604000000409
[31] Walker, S.G., Lijoi, A. and Prünster, I. (2007). On rates of convergence for posterior distributions in infinite-dimensional models. Ann. Statist. 35 738-746. · Zbl 1117.62047 · doi:10.1214/009053606000001361
[32] Wong, W.H. and Shen, X. (1995). Probability inequalities for likelihood ratios and convergence rates of sieve MLEs. Ann. Statist. 23 339-362. · Zbl 0829.62002 · doi:10.1214/aos/1176324524
[33] Zhang, C.-H. (1990). Fourier methods for estimating mixing densities and distributions. Ann. Statist. 18 806-831. · Zbl 0778.62037 · doi:10.1214/aos/1176347627
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.