Document Zbl 1539.62148

Complexity analysis of Bayesian learning of high-dimensional DAG models and their equivalence classes. (English) Zbl 1539.62148

Ann. Stat. 51, No. 3, 1058-1085 (2023).

Summary: Structure learning via MCMC sampling is known to be very challenging because of the enormous search space and the existence of Markov equivalent DAGs. Theoretical results on the mixing behavior are lacking. In this work, we prove the rapid mixing of a random walk Metropolis-Hastings algorithm, which reveals that the complexity of Bayesian learning of sparse equivalence classes grows only polynomially in \(n\) and \(p\), under some high-dimensional assumptions. A series of high-dimensional consistency results is obtained, including the strong selection consistency of an empirical Bayes model for structure learning. Our proof is based on two new results. First, we derive a general mixing time bound on finite-state spaces, which can be applied to local MCMC schemes for other model selection problems. Second, we construct high-probability search paths on the space of equivalence classes with node degree constraints by proving a combinatorial property of DAG comparisons. Simulation studies on the proposed MCMC sampler are conducted to illustrate the main theoretical findings.

Cited in 1 Document

MSC:

62H12	Estimation in multivariate analysis
62F15	Bayesian inference
62J05	Linear regression; mixed models
62F12	Asymptotic properties of parametric estimators
68T05	Learning and adaptive systems in artificial intelligence

Keywords:

finite Markov chains; greedy equivalence search (GES); locally informed proposals; Poincaré-type inequality; random walk Metropolis-Hastings; rapid mixing; strong selection consistency

Software:

TETRAD

Cite Review PDF

Full Text: DOI arXiv Link

References:

[1]	AGRAWAL, R. and UHLER, C. (2018). Minimal I-MAP MCMC for scalable structure discovery in causal DAG models. In International Conference on Machine Learning 89-98.
[2]	ANDERSSON, S. A., MADIGAN, D. and PERLMAN, M. D. (1997). A characterization of Markov equivalence classes for acyclic digraphs. Ann. Statist. 25 505-541. · Zbl 0876.60095 · doi:10.1214/aos/1031833662
[3]	ARAGAM, B., AMINI, A. and ZHOU, Q. (2019). Globally optimal score-based learning of directed acyclic graphs in high-dimensions. In Advances in Neural Information Processing Systems 4450-4462.
[4]	BANERJEE, S., CASTILLO, I. and GHOSAL, S. (2021). Bayesian inference in high-dimensional models. arXiv preprint. Available at arXiv:2101.04491.
[5]	BANERJEE, S. and GHOSAL, S. (2015). Bayesian structure learning in graphical models. J. Multivariate Anal. 136 147-162. · Zbl 1308.62119 · doi:10.1016/j.jmva.2015.01.015
[6]	Bickel, P. J. and Levina, E. (2008). Regularized estimation of large covariance matrices. Ann. Statist. 36 199-227. · Zbl 1132.62040 · doi:10.1214/009053607000000758
[7]	Cao, X., Khare, K. and Ghosh, M. (2019). Posterior graph selection and estimation consistency for high-dimensional Bayesian DAG models. Ann. Statist. 47 319-348. · Zbl 1417.62140 · doi:10.1214/18-AOS1689
[8]	CASTELLETTI, F., CONSONNI, G., DELLA VEDOVA, M. L. and PELUSO, S. (2018). Learning Markov equivalence classes of directed acyclic graphs: An objective Bayes approach. Bayesian Anal. 13 1235-1260. · Zbl 1407.62189 · doi:10.1214/18-BA1101
[9]	CASTELO, R. and KOČKA, T. (2003). On inclusion-driven learning of Bayesian networks. J. Mach. Learn. Res. 4 527-574. · Zbl 1102.68536 · doi:10.1162/153244304773936045
[10]	CHICKERING, D. M. (2002). Learning equivalence classes of Bayesian-network structures. J. Mach. Learn. Res. 2 445-498. · Zbl 1007.68179 · doi:10.1162/153244302760200696
[11]	CHICKERING, D. M. (2002). Optimal structure identification with greedy search. J. Mach. Learn. Res. 3 507-554. · Zbl 1084.68519
[12]	DRTON, M., FOYGEL, R. and SULLIVANT, S. (2011). Global identifiability of linear structural equation models. Ann. Statist. 39 865-886. · Zbl 1215.62052 · doi:10.1214/10-AOS859
[13]	DRTON, M. and MAATHUIS, M. H. (2017). Structure learning in graphical modeling. Annu. Rev. Stat. Appl. 4 365-393.
[14]	EATON, D. and MURPHY, K. (2007). Bayesian structure learning using dynamic programming and MCMC. In Proceedings of the Twenty-Third Conference on Uncertainty in Artificial Intelligence 101-108.
[15]	ELLIS, B. and WONG, W. H. (2008). Learning causal Bayesian network structures from experimental data. J. Amer. Statist. Assoc. 103 778-789. · Zbl 1471.62056 · doi:10.1198/016214508000000193
[16]	FRIEDMAN, N. and KOLLER, D. (2003). Being Bayesian about network structure: A Bayesian approach to structure discovery in Bayesian networks. Mach. Learn. 50 95-125. · Zbl 1033.68104
[17]	GAO, B. and CUI, Y. (2015). Learning directed acyclic graphical structures with genetical genomics data. Bioinformatics 31 3953-3960. · doi:10.1093/bioinformatics/btv513
[18]	Gao, C., van der Vaart, A. W. and Zhou, H. H. (2020). A general framework for Bayes structured linear models. Ann. Statist. 48 2848-2878. · Zbl 1471.62241 · doi:10.1214/19-AOS1909
[19]	GEIGER, D. and HECKERMAN, D. (2002). Parameter priors for directed acyclic graphical models and the characterization of several probability distributions. Ann. Statist. 30 1412-1440. · Zbl 1016.62064 · doi:10.1214/aos/1035844981
[20]	GIUDICI, P. and CASTELO, R. (2003). Improving Markov chain Monte Carlo model search for data mining. Mach. Learn. 50 127-158. · Zbl 1050.68120
[21]	GOUDIE, R. J. B. and MUKHERJEE, S. (2016). A Gibbs sampler for learning DAGs. J. Mach. Learn. Res. 17 Paper No. 30, 39. · Zbl 1360.68677
[22]	GRZEGORCZYK, M. and HUSMEIER, D. (2008). Improving the structure MCMC sampler for Bayesian networks by introducing a new edge reversal move. Mach. Learn. 71 265. · Zbl 1470.62075
[23]	HE, Y., JIA, J. and YU, B. (2013). Reversible MCMC on Markov equivalence classes of sparse directed acyclic graphs. Ann. Statist. 41 1742-1779. · Zbl 1360.62369 · doi:10.1214/13-AOS1125
[24]	HOETING, J. A., MADIGAN, D., RAFTERY, A. E. and VOLINSKY, C. T. (1999). Bayesian model averaging: A tutorial. Statist. Sci. 14 382-417. · Zbl 1059.62525 · doi:10.1214/ss/1009212519
[25]	JEONG, S. and GHOSAL, S. (2021). Posterior contraction in sparse generalized linear models. Biometrika 108 367-379. · Zbl 07458260 · doi:10.1093/biomet/asaa074
[26]	Johnson, V. E. and Rossell, D. (2012). Bayesian model selection in high-dimensional settings. J. Amer. Statist. Assoc. 107 649-660. · Zbl 1261.62024 · doi:10.1080/01621459.2012.682536
[27]	KAHALE, N. (1997). A semidefinite bound for mixing rates of Markov chains. Random Structures Algorithms 11 299-313. · Zbl 0889.90154 · doi:10.1002/(SICI)1098-2418(199712)11:4<299::AID-RSA2>3.0.CO;2-U
[28]	Kalisch, M. and Bühlmann, P. (2007). Estimating high-dimensional directed acyclic graphs with the PC-algorithm. J. Mach. Learn. Res. 8 613-636. · Zbl 1222.68229
[29]	Koller, D. and Friedman, N. (2009). Probabilistic Graphical Models: Principles and Techniques. Adaptive Computation and Machine Learning. MIT Press, Cambridge, MA. · Zbl 1183.68483
[30]	KUIPERS, J. and MOFFA, G. (2017). Partition MCMC for inference on acyclic digraphs. J. Amer. Statist. Assoc. 112 282-299. · doi:10.1080/01621459.2015.1133426
[31]	Lam, C. and Fan, J. (2009). Sparsistency and rates of convergence in large covariance matrix estimation. Ann. Statist. 37 4254-4278. · Zbl 1191.62101 · doi:10.1214/09-AOS720
[32]	LAURITZEN, S. L. (1992). Propagation of probabilities, means, and variances in mixed graphical association models. J. Amer. Statist. Assoc. 87 1098-1108. · Zbl 0850.62094
[33]	LEE, K., LEE, J. and LIN, L. (2019). Minimax posterior convergence rates and model selection consistency in high-dimensional DAG models based on sparse Cholesky factors. Ann. Statist. 47 3413-3437. · Zbl 1435.62037 · doi:10.1214/18-AOS1783
[34]	LIU, S., SUZUKI, T., RELATOR, R., SESE, J., SUGIYAMA, M. and FUKUMIZU, K. (2017). Support consistency of direct sparse-change learning in Markov networks. Ann. Statist. 45 959-990. · Zbl 1371.62022 · doi:10.1214/16-AOS1470
[35]	MAATHUIS, M. H., COLOMBO, D., KALISCH, M. and BÜHLMANN, P. (2010). Predicting causal effects in large-scale systems from observational data. Nat. Methods 7 247-248. · doi:10.1038/nmeth0410-247
[36]	MADIGAN, D., YORK, J. and ALLARD, D. (1995). Bayesian graphical models for discrete data. Int. Stat. Rev. 215-232. · Zbl 0834.62003
[37]	MADIGAN, D., ANDERSSON, S. A., PERLMAN, M. D. and VOLINSKY, C. T. (1996). Bayesian model averaging and model selection for Markov equivalence classes of acyclic digraphs. Comm. Statist. Theory Methods 25 2493-2519. · Zbl 0894.62032
[38]	Martin, R., Mess, R. and Walker, S. G. (2017). Empirical Bayes posterior concentration in sparse high-dimensional linear models. Bernoulli 23 1822-1847. · Zbl 1450.62085
[39]	MEEK, C. (1997). Graphical models: Selecting causal and statistical models. Ph.D. thesis, Carnegie Mellon Univ., Pittsburgh, PA.
[40]	Meinshausen, N. and Bühlmann, P. (2006). High-dimensional graphs and variable selection with the lasso. Ann. Statist. 34 1436-1462. · Zbl 1113.62082 · doi:10.1214/009053606000000281
[41]	MUNTEANU, P. and BENDOU, M. (2001). The EQ framework for learning equivalence classes of Bayesian networks. In Proceedings 2001 IEEE International Conference on Data Mining 417-424. IEEE, San Jose, CA.
[42]	NANDY, P., HAUSER, A. and MAATHUIS, M. H. (2018). High-dimensional consistency in score-based and hybrid structure learning. Ann. Statist. 46 3151-3183. · Zbl 1411.62144 · doi:10.1214/17-AOS1654
[43]	Narisetty, N. N. and He, X. (2014). Bayesian variable selection with shrinking and diffusing priors. Ann. Statist. 42 789-817. · Zbl 1302.62158 · doi:10.1214/14-AOS1207
[44]	NIINIMÄKI, T. M., PARVIAINEN, P. and KOIVISTO, M. (2011). Partial order MCMC for structure discovery in Bayesian networks. In Proceedings of the Twenty-Seventh Conference Conference on Uncertainty in Artificial Intelligence (UAI-11) 557-564. AUAI Press, Barcelona, Spain. · Zbl 1395.62057
[45]	Pati, D., Bhattacharya, A., Pillai, N. S. and Dunson, D. (2014). Posterior contraction in sparse Bayesian factor models for massive covariance matrices. Ann. Statist. 42 1102-1130. · Zbl 1305.62124 · doi:10.1214/14-AOS1215
[46]	Pearl, J. (1988). Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. The Morgan Kaufmann Series in Representation and Reasoning. Morgan Kaufmann, San Mateo, CA.
[47]	PELUSO, S. and CONSONNI, G. (2020). Compatible priors for model selection of high-dimensional Gaussian DAGs. Electron. J. Stat. 14 4110-4132. · Zbl 1455.62062 · doi:10.1214/20-EJS1768
[48]	PENA, J. M. (2007). Approximate counting of graphical models via MCMC. In AISTATS 355-362.
[49]	PERLMAN, M. D. (2001). Graphical model search via essential graphs. In Algebraic Methods in Statistics and Probability (Notre Dame, IN, 2000). Contemp. Math. 287 255-265. Amer. Math. Soc., Providence, RI. · Zbl 1018.05098 · doi:10.1090/conm/287/04790
[50]	RASKUTTI, G., YU, B. and WAINWRIGHT, M. J. (2008). Model selection in Gaussian graphical models: High-dimensional consistency of \(\ell_1\)-regularized MLE. Adv. Neural Inf. Process. Syst. 21.
[51]	SCUTARI, M., GRAAFLAND, C. E. and GUTIÉRREZ, J. M. (2019). Who learns better Bayesian network structures: Accuracy and speed of structure learning algorithms. Internat. J. Approx. Reason. 115 235-253. · Zbl 1470.68171 · doi:10.1016/j.ijar.2019.10.003
[52]	Sinclair, A. (1992). Improved bounds for mixing rates of Markov chains and multicommodity flow. Combin. Probab. Comput. 1 351-370. · Zbl 0801.90039 · doi:10.1017/S0963548300000390
[53]	SOLUS, L., WANG, Y. and UHLER, C. (2021). Consistency guarantees for greedy permutation-based causal inference algorithms. Biometrika 108 795-814. · Zbl 07459733 · doi:10.1093/biomet/asaa104
[54]	Spirtes, P., Glymour, C. and Scheines, R. (2000). Causation, Prediction, and Search, 2nd ed. Adaptive Computation and Machine Learning. MIT Press, Cambridge, MA. · Zbl 0806.62001
[55]	Studený, M. (2005). Probabilistic Conditional Independence Structures. Information Science and Statistics. Springer, London. · Zbl 1070.62001
[56]	SU, C. and BORSUK, M. E. (2016). Improving structure MCMC for Bayesian networks through Markov blanket resampling. J. Mach. Learn. Res. 17 Paper No. 118, 20. · Zbl 1367.68238
[57]	Sun, T. and Zhang, C.-H. (2013). Sparse matrix inversion with scaled Lasso. J. Mach. Learn. Res. 14 3385-3418. · Zbl 1318.62184
[58]	TALWAR, K. (2019). Computational separations between sampling and optimization. Adv. Neural Inf. Process. Syst. 32.
[59]	UHLER, C., RASKUTTI, G., BÜHLMANN, P. and YU, B. (2013). Geometry of the faithfulness assumption in causal inference. Ann. Statist. 41 436-463. · Zbl 1267.62068 · doi:10.1214/12-AOS1080
[60]	VAN DE GEER, S. and BÜHLMANN, P. (2013). \( \ell_0\)-penalized maximum likelihood for sparse directed acyclic graphs. Ann. Statist. 41 536-567. · Zbl 1267.62037 · doi:10.1214/13-AOS1085
[61]	VERMA, T. and PEARL, J. (1991). Equivalence and synthesis of causal models. Technical report, UCLA, Computer Science Department. · Zbl 07672243
[62]	Vershynin, R. (2012). Introduction to the non-asymptotic analysis of random matrices. In Compressed Sensing 210-268. Cambridge Univ. Press, Cambridge.
[63]	WAINWRIGHT, M. J. (2009). Sharp thresholds for high-dimensional and noisy sparsity recovery using \(\ell_1\)-constrained quadratic programming (Lasso). IEEE Trans. Inf. Theory 55 2183-2202. · Zbl 1367.62220 · doi:10.1109/TIT.2009.2016018
[64]	Yang, Y. and Tokdar, S. T. (2015). Minimax-optimal nonparametric regression in high dimensions. Ann. Statist. 43 652-674. · Zbl 1312.62052 · doi:10.1214/14-AOS1289
[65]	Yang, Y., Wainwright, M. J. and Jordan, M. I. (2016). On the computational complexity of high-dimensional Bayesian variable selection. Ann. Statist. 44 2497-2532. · Zbl 1359.62088
[66]	Zanella, G. (2020). Informed proposals for local MCMC in discrete spaces. J. Amer. Statist. Assoc. 115 852-865. · Zbl 1445.60053 · doi:10.1080/01621459.2019.1585255
[67]	ZHOU, Q. and CHANG, H. (2023). Supplement to “Complexity analysis of Bayesian learning of high-dimensional DAG models and their equivalence classes.” https://doi.org/10.1214/23-AOS2280SUPP · Zbl 1539.62148
[68]	ZHOU, Q., YANG, J., VATS, D., ROBERTS, G. O. and ROSENTHAL, J. S. (2022). Dimension-free mixing for high-dimensional Bayesian variable selection. J. R. Stat. Soc. Ser. B. Stat. Methodol. 84 1751-1784. · Zbl 07686598 · doi:10.1111/rssb.12546
[69]	ZHUO, B. and GAO, C. (2021). Mixing time of Metropolis-Hastings for Bayesian community detection. J. Mach. Learn. Res. 22 Paper No. 10, 89 · Zbl 1539.68290

This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.