×

Dynamics of Bayesian updating with dependent data and misspecified models. (English) Zbl 1326.62017

Summary: Much is now known about the consistency of Bayesian updating on infinite-dimensional parameter spaces with independent or Markovian data. Necessary conditions for consistency include the prior putting enough weight on the correct neighborhoods of the data-generating distribution; various sufficient conditions further restrict the prior in ways analogous to capacity control in frequentist nonparametrics. The asymptotics of Bayesian updating with mis-specified models or priors, or non-Markovian data, are far less well explored. Here I establish sufficient conditions for posterior convergence when all hypotheses are wrong, and the data have complex dependencies. The main dynamical assumption is the asymptotic equipartition (Shannon-McMillan-Breiman) property of information theory. This, along with Egorov’s Theorem on uniform convergence, lets me build a sieve-like structure for the prior. The main statistical assumption, also a form of capacity control, concerns the compatibility of the prior and the data-generating process, controlling the fluctuations in the log-likelihood when averaged over the sieve-like sets. In addition to posterior convergence, I derive a kind of large deviations principle for the posterior measure, extending in some cases to rates of convergence, and discuss the advantages of predicting using a combination of models known to be wrong. An appendix sketches connections between these results and the replicator dynamics of evolutionary theory.

MSC:

62C10 Bayesian problems; characterization of Bayes procedures
62G20 Asymptotic properties of nonparametric inference
62M09 Non-Markovian processes: estimation
60F10 Large deviations
62M05 Markov processes: estimation; hidden Markov models
92D15 Problems related to evolution
94A17 Measures of information, entropy

References:

[1] Algoet, P. H. and Cover, T. M. (1988). A sandwich proof of the Shannon-McMillan-Breiman theorem., Annals of Probability 16 , 899-909. http://projecteuclid.org/euclid.aop/1176991794. · Zbl 0653.28013 · doi:10.1214/aop/1176991794
[2] Arora, S., Hazan, E., and Kale, S. (2005). The multiplicative weights update method: a meta algorithm and applications., http://www.cs.princeton.edu/ arora/pubs/MWsurvey.pdf. · Zbl 1283.68414
[3] Badii, R. and Politi, A. (1997)., Complexity: Hierarchical Structures and Scaling in Physics . Cambridge University Press, Cambridge, England. · Zbl 1042.82500
[4] Barron, A., Schervish, M. J., and Wasserman, L. (1999). The consistency of posterior distributions in nonparametric problems., The Annals of Statistics 27 , 536-561. http://projecteuclid.org/euclid.aos/1018031206. · Zbl 0980.62039 · doi:10.1214/aos/1018031206
[5] Berk, R. H. (1966). Limiting behavior of posterior distributions when the model is incorrect., Annals of Mathematical Statistics 37 , 51-58. See also correction, volume 37 (1966), pp. 745-746, http://projecteuclid.org/euclid.aoms/1177699597. · Zbl 0151.23802 · doi:10.1214/aoms/1177699597
[6] Berk, R. H. (1970). Consistency a posteriori., Annals of Mathematical Statistics 41 , 894-906. http://projecteuclid.org/euclid.aoms/1177696967. · Zbl 0214.45703 · doi:10.1214/aoms/1177696967
[7] Blackwell, D. and Dubins, L. (1962). Merging of opinion with increasing information., Annals of Mathematical Statistics 33 , 882-886. http://projecteuclid.org/euclid.aoms/1177704456. · Zbl 0109.35704 · doi:10.1214/aoms/1177704456
[8] Börgers, T. and Sarin, R. (1997). Learning through reinforcement and replicator dynamics., Journal of Economic Theory 77 , 1-14. · Zbl 0892.90198 · doi:10.1006/jeth.1997.2319
[9] Borkar, V. S. (2002). Reinforcement learning in Markovian evolutionary games., Advances in Complex Systems 5 , 55-72. · Zbl 1053.91022 · doi:10.1142/S0219525902000535
[10] Cesa-Bianchi, N. and Lugosi, G. (2006)., Prediction, Learning, and Games . Cambridge University Press, Cambridge, England. · Zbl 1114.91001 · doi:10.1017/CBO9780511546921
[11] Chamley, C. (2004)., Rational Herds: Economic Models of Social Learning . Cambridge University Press, Cambridge, England.
[12] Charniak, E. (1993)., Statistical Language Learning . MIT Press, Cambridge, Massachusetts.
[13] Choi, T. and Ramamoorthi, R. V. (2008). Remarks on consistency of posterior distributions. In, Pushing the Limits of Contemporary Statistics: Contributions in Honor of Jayanta K. Ghosh , B. Clarke and S. Ghosal, Eds. Institute of Mathematical Statistics, Beechwood, Ohio, 170-186. http://arxiv.org/abs/0805.3248. · doi:10.1214/074921708000000138
[14] Choudhuri, N., Ghosal, S., and Roy, A. (2004). Bayesian estimation of the spectral density of a time series., Journal of the American Statistical Association 99 , 1050-1059. http://www4.stat.ncsu.edu/ sghosal/papers/specden.pdf. · Zbl 1055.62100 · doi:10.1198/016214504000000557
[15] Crutchfield, J. P. (1992). Semantics and thermodynamics. In, Nonlinear Modeling and Forecasting , M. Casdagli and S. Eubank, Eds. Addison-Wesley, Reading, Massachusetts, 317-359.
[16] Daw, C. S., Finney, C. E. A., and Tracy, E. R. (2003). A review of symbolic analysis of experimental data., Review of Scientific Instruments 74 , 916-930. http://www-chaos.engr.utk.edu/abs/abs-rsi2002.html.
[17] Dȩbowski, Ł. (2006). Ergodic decomposition of excess entropy and conditional mutual information. Tech. Rep. 993, Institute of Computer Science, Polish Academy of Sciences (IPI PAN)., http://www.ipipan.waw.pl/ ldebowsk/docs/raporty/ee_report.pdf.
[18] Diaconis, P. and Freedman, D. (1986). On the consistency of Bayes estimates., The Annals of Statistics 14 , 1-26. http://projecteuclid.org/euclid.aos/1176349830. · Zbl 0595.62022 · doi:10.1214/aos/1176349830
[19] Doob, J. L. (1949). Application of the theory of martingales. In, Colloques Internationaux du Centre National de la Recherche Scientifique . Vol. 13 . Centre National de la Recherche Scientifique, Paris, 23-27. · Zbl 0041.45101
[20] Dynkin, E. B. (1978). Sufficient statistics and extreme points., Annals of Probability 6 , 705-730. http://projecteuclid.org/euclid.aop/1176995424. · Zbl 0403.62009 · doi:10.1214/aop/1176995424
[21] Earman, J. (1992)., Bayes or Bust? A Critical Account of Bayesian Confirmation Theory . MIT Press, Cambridge, Massachusetts.
[22] Eichelsbacher, P. and Ganesh, A. (2002). Moderate deviations for Bayes posteriors., Scandanavian Journal of Statistics 29 , 153-167. · Zbl 1017.62006 · doi:10.1111/1467-9469.00278
[23] Fisher, R. A. (1958)., The Genetical Theory of Natural Selection , Second ed. Dover, New York. First edition published Oxford: Clarendon Press, 1930. · JFM 56.1106.13
[24] Fraser, A. M. (2008)., Hidden Markov Models and Dynamical Systems . SIAM Press, Philadelphia. · Zbl 1156.62065 · doi:10.1137/1.9780898717747
[25] Geman, S. and Hwang, C.-R. (1982). Nonparametric maximum likelihood estimation by the method of sieves., The Annals of Statistics 10 , 401-414. http://projecteuclid.org/euclid.aos/1176345782. · Zbl 0494.62041 · doi:10.1214/aos/1176345782
[26] Ghosal, S., Ghosh, J. K., and Ramamoorthi, R. V. (1999). Consistency issues in Bayesian nonparametrics. In, Asymptotics, Nonparametrics and Time Series: A Tribute to Madan Lal Puri , S. Ghosh, Ed. Marcel Dekker, 639-667. http://www4.stat.ncsu.edu/ sghosal/papers/review.pdf. · Zbl 1069.62516
[27] Ghosal, S., Ghosh, J. K., and van der Vaart, A. W. (2000). Convergence rates of posterior distributions., Annals of Statistics 28 , 500-531. http://projecteuclid.org/euclid.aos/1016218228. · Zbl 1105.62315 · doi:10.1214/aos/1016218228
[28] Ghosal, S. and Tang, Y. (2006). Bayesian consistency for Markov processes., Sankhya 68 , 227-239. http://sankhya.isical.ac.in/search/68_2/2006010.html. · Zbl 1193.62035
[29] Ghosal, S. and van der Vaart, A. (2007). Convergence rates of posterior distributions for non-iid observations., Annals of Statistics 35 , 192-223. http://arxiv.org/abs/0708.0491. · Zbl 1114.62060 · doi:10.1214/009053606000001172
[30] Ghosh, J. K. and Ramamoorthi, R. V. (2003)., Bayesian Nonparametrics . Springer Verlag, New York. · Zbl 1029.62004
[31] Gray, R. M. (1988)., Probability, Random Processes, and Ergodic Properties . Springer-Verlag, New York. http://ee.stanford.edu/ gray/arp.html. · Zbl 0644.60001
[32] Gray, R. M. (1990)., Entropy and Information Theory . Springer-Verlag, New York. http://ee.stanford.edu/ gray/it.html. · Zbl 0722.94001
[33] Haldane, J. B. S. (1954). The measurement of natural selection. In, Proceedings of the 9th International Congress of Genetics . Vol. 1 . 480-487.
[34] Hofbauer, J. and Sigmund, K. (1998)., Evolutionary Games and Population Dynamics . Cambridge University Press, Cambridge, England. · Zbl 0914.90287 · doi:10.1017/CBO9781139173179
[35] Kallenberg, O. (2002)., Foundations of Modern Probability , Second ed. Springer-Verlag, New York. · Zbl 0996.60001
[36] Kitchens, B. and Tuncel, S. (1985)., Finitary Measures for Subshifts of Finite Type and Sofic Systems . Memoirs of the American Mathematical Society, Vol. 338 . American Mathematical Society, Providence, Rhode Island. · Zbl 0594.28024
[37] Kitchens, B. P. (1998)., Symbolic Dynamics: One-sided, Two-sided and Countable State Markov Shifts . Springer-Verlag, Berlin. · Zbl 0892.58020
[38] Kleijn, B. J. K. and van der Vaart, A. W. (2006). Misspecification in infinite-dimensional Bayesian statistics., Annals of Statistics 34 , 837-877. http://arxiv.org/math.ST/0607023. · Zbl 1095.62031 · doi:10.1214/009053606000000029
[39] Knight, F. B. (1975). A predictive view of continuous time processes., Annals of Probability 3 , 573-596. http://projecteuclid.org/euclid.aop/1176996302. · Zbl 0317.60018 · doi:10.1214/aop/1176996302
[40] Krogh, A. and Vedelsby, J. (1995). Neural network ensembles, cross validation, and active learning. In, Advances in Neural Information Processing 7 [NIPS 1994] , G. Tesauro, D. Tourtetsky, and T. Leen, Eds. MIT Press, Cambridge, Massachusetts, 231-238. http://books.nips.cc/papers/files/nips07/0231.pdf.
[41] Lian, H. (2007). On rates of convergence for posterior distributions under misspecification. E-print, arxiv.org., http://arxiv.org/abs/math.ST/0702126. · Zbl 1167.62045 · doi:10.1080/03610920802478375
[42] Lijoi, A., Prünster, I., and Walker, S. G. (2007). Bayesian consistency for stationary models., Econometric Theory 23 , 749-759. · Zbl 1237.62033 · doi:10.1017/S0266466607070314
[43] Lind, D. and Marcus, B. (1995)., An Introduction to Symbolic Dynamics and Coding . Cambridge University Press, Cambridge, England. · Zbl 1106.37301 · doi:10.1017/CBO9780511626302
[44] Marton, K. and Shields, P. C. (1994). Entropy and the consistent estimation of joint distributions., The Annals of Probability 22 , 960-977. Correction, The Annals of Probability , 24 (1996): 541-545, http://projecteuclid.org/euclid.aop/1176988736. · Zbl 0861.28015 · doi:10.1214/aop/1176988736
[45] McAllister, D. A. (1999). Some PAC-Bayesian theorems., Machine Learning 37 , 355-363. · Zbl 0945.68157
[46] Meir, R. (2000). Nonparametric time series prediction through adaptive model selection., Machine Learning 39 , 5-34. http://www.ee.technion.ac.il/ rmeir/Publications/MeirTimeSeries00.pdf. · Zbl 0954.68124 · doi:10.1023/A:1007602715810
[47] Ornstein, D. S. and Weiss, B. (1990). How sampling reveals a process., The Annals of Probability 18 , 905-930. http://projecteuclid.org/euclid.aop/1176990729. · Zbl 0709.60036 · doi:10.1214/aop/1176990729
[48] Page, S. E. (2007)., The Difference: How the Power of Diveristy Creates Better Groups, Firms, Schools, and Societies . Princeton University Press, Princeton, New Jersey.
[49] Papangelou, F. (1996). Large deviations and the Bayesian estimation of higher-order Markov transition functions., Journal of Applied Probability 33 , 18-27. http://www.jstor.org/stable/3215260. · Zbl 0845.60025 · doi:10.2307/3215260
[50] Perry, N. and Binder, P.-M. (1999). Finite statistical complexity for sofic systems., Physical Review E 60 , 459-463.
[51] Rivers, D. and Vuong, Q. H. (2002). Model selection tests for nonlinear dynamic models., The Econometrics Journal 5 , 1-39. · Zbl 1010.62110 · doi:10.1111/1368-423X.t01-1-00071
[52] Roy, A., Ghosal, S., and Rosenberger, W. F. (2009). Convergence properties of sequential Bayesian, d -optimal designs. Journal of Statistical Planning and Inference 139 , 425-440. · Zbl 1149.62066 · doi:10.1016/j.jspi.2008.04.025
[53] Ryabko, D. and Ryabko, B. (2008). Testing statistical hypotheses about ergodic processes. E-print, arxiv.org, 0804.0510., http://arxiv.org/abs/0804.0510. · Zbl 1156.60312
[54] Sato, Y. and Crutchfield, J. P. (2003). Coupled replicator equations for the dynamics of learning in multiagent systems., Physical Review E 67 , 015206. http://arxiv.org/abs/nlin.AO/0204057.
[55] Schervish, M. J. (1995)., Theory of Statistics . Springer Series in Statistics. Springer-Verlag, Berlin. · Zbl 0834.62002
[56] Schwartz, L. (1965). On Bayes procedures., Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete 4 , 10-26. · Zbl 0158.17606 · doi:10.1007/BF00535479
[57] Shalizi, C. R. and Crutchfield, J. P. (2001). Computational mechanics: Pattern and prediction, structure and simplicity., Journal of Statistical Physics 104 , 817-879. http://arxiv.org/abs/cond-mat/9907176. · Zbl 1100.82500 · doi:10.1023/A:1010388907793
[58] Shalizi, C. R. and Klinkner, K. L. (2004). Blind construction of optimal nonlinear recursive predictors for discrete sequences. In, Uncertainty in Artificial Intelligence: Proceedings of the Twentieth Conference (UAI 2004) , M. Chickering and J. Y. Halpern, Eds. AUAI Press, Arlington, Virginia, 504-511. http://arxiv.org/abs/cs.LG/0406011.
[59] Shen, X. and Wasserman, L. (2001). Rates of convergence of posterior distributions., Annals of Statistics 29 , 687-714. http://projecteuclid.org/euclid.aos/1009210686. · Zbl 1041.62022 · doi:10.1214/aos/1009210686
[60] Shields, P. C. (1996)., The Ergodic Theory of Discrete Sample Paths . American Mathematical Society, Providence, Rhode Island. · Zbl 0879.28031
[61] Strelioff, C. C., Crutchfield, J. P., and Hübler, A. W. (2007). Inferring Markov chains: Bayesian estimation, model comparison, entropy rate, and out-of-class modeling., Physical Review E 76 , 011106. http://arxiv.org/math.ST/0703715.
[62] Varn, D. P. and Crutchfield, J. P. (2004). From finite to infinite range order via annealing: The causal architecture of deformation faulting in annealed close-packed crystals., Physics Letters A 324 , 299-307. http://arxiv.org/abs/cond-mat/0307296. · Zbl 1123.82363 · doi:10.1016/j.physleta.2004.02.077
[63] Vidyasagar, M. (2003)., Learning and Generalization: With Applications to Neural Networks , Second ed. Springer-Verlag, Berlin. · Zbl 0928.68061
[64] Vuong, Q. H. (1989). Likelihood ratio tests for model selection and non-nested hypotheses., Econometrica 57 , 307-333. http://www.jstor.org/pss/1912557. · Zbl 0701.62106 · doi:10.2307/1912557
[65] Walker, S. (2004). New approaches to Bayesian consistency., Annals of Statistics 32 , 2028-2043. http://arxiv.org/abs/math.ST/0503672. · Zbl 1056.62040 · doi:10.1214/009053604000000409
[66] Weiss, B. (1973). Subshifts of finite type and sofic systems., Monatshefte für Mathematik 77 , 462-474. · Zbl 0285.28021 · doi:10.1007/BF01295322
[67] Xing, Y. and Ranneby, B. (2008). Both necessary and sufficient conditions for Bayesian exponential consistency., http://arxiv.org/abs/0812.1084. · Zbl 1161.62016
[68] Zhang, T. (2006). From, \epsilon -entropy to KL-entropy: Analysis of minimum information complexity density estimation. Annals of Statistics 34 , 2180-2210. http://arxiv.org/math.ST/0702653. · Zbl 1106.62005 · doi:10.1214/009053606000000704
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.