×

Reversible MCMC on Markov equivalence classes of sparse directed acyclic graphs. (English) Zbl 1360.62369

Summary: Graphical models are popular statistical tools which are used to represent dependent or causal complex systems. Statistically equivalent causal or directed graphical models are said to belong to a Markov equivalent class. It is of great interest to describe and understand the space of such classes. However, with currently known algorithms, sampling over such classes is only feasible for graphs with fewer than approximately 20 vertices. In this paper, we design reversible irreducible Markov chains on the space of Markov equivalent classes by proposing a perfect set of operators that determine the transitions of the Markov chain. The stationary distribution of a proposed Markov chain has a closed form and can be computed easily. Specifically, we construct a concrete perfect set of operators on sparse Markov equivalence classes by introducing appropriate conditions on each possible operator. Algorithms and their accelerated versions are provided to efficiently generate Markov chains and to explore properties of Markov equivalence classes of sparse directed acyclic graphs (DAGs) with thousands of vertices. We find experimentally that in most Markov equivalence classes of sparse DAGs, (1) most edges are directed, (2) most undirected subgraphs are small and (3) the number of these undirected subgraphs grows approximately linearly with the number of vertices.

MSC:

62H99 Multivariate analysis
62H05 Characterization and structure theory for multivariate probability distributions; copulas
60J10 Markov chains (discrete-time Markov processes on discrete state spaces)
05C81 Random walks on graphs

Software:

TETRAD; pcalg

References:

[1] Aldous, D. and Fill, J. Reversible Markov chains and random walks on graphs. Available at .
[2] Andersson, S. A., Madigan, D. and Perlman, M. D. (1997). A characterization of Markov equivalence classes for acyclic digraphs. Ann. Statist. 25 505-541. · Zbl 0876.60095 · doi:10.1214/aos/1031833662
[3] Castelo, R. and Perlman, M. D. (2004). Learning essential graph Markov models from data. In Advances in Bayesian Networks. Studies in Fuzziness and Soft Computing 146 255-269. Springer, Berlin. · doi:10.1007/978-3-540-39879-0_14
[4] Chickering, D., Geiger, D. and Heckerman, D. (1995). Learning Bayesian networks: Search methods and experimental results. In Proceedings of Fifth Conference on Artificial Intelligence and Statistics 112-128. Ft. Lauerdale, Society for Artificial Intelligence in Statistics, FL. · Zbl 0831.68096
[5] Chickering, D. M. (1995). A transformational characterization of equivalent Bayesian network structures. In Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence ( Montreal , PQ , 1995) 87-98. Morgan Kaufmann, San Francisco, CA.
[6] Chickering, D. M. (2002). Learning equivalence classes of Bayesian-network structures. J. Mach. Learn. Res. 2 445-498. · Zbl 1007.68179 · doi:10.1162/153244302760200696
[7] Chickering, D. M. (2003). Optimal structure identification with greedy search: Computational learning theory. J. Mach. Learn. Res. 3 507-554. · Zbl 1084.68519 · doi:10.1162/153244303321897717
[8] Cooper, G. F. and Yoo, C. (1999). Causal discovery from a mixture of experimental and observational data. In Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence 116-125. Morgan Kaufmann, San Francisco, CA.
[9] Dash, D. and Druzdzel, M. J. (1999). A hybrid anytime algorithm for the construction of causal models from sparse data. In Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence 142-149. Morgan Kaufmann, San Mateo, CA.
[10] Dor, D. and Tarsi, M. (1992). A simple algorithm to construct a consistent extension of a partially oriented graph. Technicial Report R-185, Cognitive Systems Laboratory, UCLA.
[11] Eberhardt, F. and Scheines, R. (2007). Interventions and causal inference. Philos. Sci. 74 981-995. · doi:10.1086/525638
[12] Finegold, M. and Drton, M. (2011). Robust graphical modeling of gene networks using classical and alternative \(t\)-distributions. Ann. Appl. Stat. 5 1057-1080. · Zbl 1232.62083 · doi:10.1214/10-AOAS410
[13] Friedman, N. (2004). Inferring cellular networks using probabilistic graphical models. Science Signaling 303 799.
[14] Gillispie, S. B. (2006). Formulas for counting acyclic digraph Markov equivalence classes. J. Statist. Plann. Inference 136 1410-1432. · Zbl 1088.05036 · doi:10.1016/j.jspi.2004.10.007
[15] Gillispie, S. B. and Perlman, M. D. (2002). The size distribution for Markov equivalence classes of acyclic digraph models. Artificial Intelligence 141 137-155. · Zbl 1043.68096 · doi:10.1016/S0004-3702(02)00264-3
[16] He, Y., Jia, J. and Yu, B. (2013). Supplement to “Reversible MCMC on Markov equivalence classes of sparse directed acyclic graphs.” . · Zbl 1360.62369
[17] He, Y.-B. and Geng, Z. (2008). Active learning of causal networks with intervention experiments and optimal designs. J. Mach. Learn. Res. 9 2523-2547. · Zbl 1225.68184
[18] Heckerman, D., Geiger, D. and Chickering, D. M. (1995). Learning Bayesian networks: The combination of knowledge and statistical data. Machine Learning 20 197-243. · Zbl 0831.68096
[19] Heckerman, D., Meek, C. and Cooper, G. (1999). A Bayesian approach to causal discovery. In Computation , Causation , and Discovery 141-165. AAAI Press, Menlo Park, CA.
[20] Jansen, R., Yu, H., Greenbaum, D., Kluger, Y., Krogan, N. J., Chung, S., Emili, A., Snyder, M., Greenblatt, J. F. and Gerstein, M. (2003). A Bayesian networks approach for predicting protein-protein interactions from genomic data. Science 302 449.
[21] Kalisch, M. and Buhlmann, P. (2007). Estimating high-dimensional directed acyclic graphs with the pc-algorithm. J. Mach. Learn. Res. 8 613-636. · Zbl 1222.68229
[22] Lauritzen, S. L. and Richardson, T. S. (2002). Chain graph models and their causal interpretations. J. R. Stat. Soc. Ser. B Stat. Methodol. 64 321-361. · Zbl 1090.62103 · doi:10.1111/1467-9868.00340
[23] Lovasz, L. (1993). Random walks on graphs: A survey. Combinatorics : Paul Erdős Is Eighty 2 1-46.
[24] Maathuis, M. H., Kalisch, M. and Bühlmann, P. (2009). Estimating high-dimensional intervention effects from observational data. Ann. Statist. 37 3133-3164. · Zbl 1191.62118 · doi:10.1214/09-AOS685
[25] Madigan, D., Andersson, S. A., Perlman, M. D. and Volinsky, C. T. (1996). Bayesian model averaging and model selection for Markov equivalence classes of acyclic digraphs. Comm. Statist. Theory Methods 25 2493-2519. · Zbl 0894.62032 · doi:10.1080/03610929608831853
[26] Meek, C. (1995). Causal inference and causal explanation with background knowledge. In Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence ( San Mateo ) 403-410. Morgan Kaufmann, San Francisco, CA.
[27] Munteanu, P. and Bendou, M. (2001). The eq framework for learning equivalence classes of Bayesian networks. In Proceedings IEEE International Conference on Data Mining , 2001. ICDM 2001 417-424. IEEE, San Jose, CA.
[28] Norris, J. R. (1997). Markov Chains. Cambridge Series in Statistical and Probabilistic Mathematics 2 . Cambridge Univ. Press, Cambridge. · Zbl 0873.60043
[29] Peña, J. M. (2007). Approximate counting of graphical models via MCMC. In Proceedings of the 11 th International Conference on Artificial Intelligence 352-359. San Juan, Puerto Rico; available at .
[30] Peña, J. M. (2013). Approximate counting of graphical models via mcmc revisited. Preprint. Available at . 1301.7189
[31] Pearl, J. (1988). Probabilistic Reasoning in Intelligent Systems : Networks of Plausible Inference . Morgan Kaufmann, San Mateo, CA. · Zbl 0746.68089
[32] Pearl, J. (2000). Causality : Models , Reasoning , and Inference . Cambridge Univ. Press, Cambridge. · Zbl 0959.68116
[33] Pearl, J. and Verma, T. S. (1991). A theory of inferred causation. In Principles of Knowledge Representation and Reasoning ( Cambridge , MA , 1991) 441-452. Morgan Kaufmann, San Mateo, CA. · Zbl 0765.68177
[34] Perlman, M. D. (2001). Graphical model search via essential graphs. In Algebraic Methods in Statistics and Probability ( Notre Dame , IN , 2000). Contemporary Mathematics 287 255-265. Amer. Math. Soc., Providence, RI. · Zbl 1018.05098 · doi:10.1090/conm/287/04790
[35] Spirtes, P., Glymour, C. N. and Scheines, R. (2001). Causation , Prediction , and Search . MIT Press, Cambridge. · Zbl 0981.62001
[36] Verma, T. and Pearl, J. (1990). Equivalence and synthesis of causal models. In Proceedings of the Sixth Annual Conference on Uncertainty in Artificial Intelligence 270. Elsevier, Amsterdam.
[37] Verma, T. and Pearl, J. (1992). An algorithm for deciding if a set of observed independencies has a causal explanation. In Proceedings of the Eighth International Conference on Uncertainty in Artificial Intelligence 323-330. Morgan Kaufmann, San Mateo, CA.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.