×

Foundations of structural causal models with cycles and latent variables. (English) Zbl 1486.62176

Summary: Structural causal models (SCMs), also known as (nonparametric) structural equation models (SEMs), are widely used for causal modeling purposes. In particular, acyclic SCMs, also known as recursive SEMs, form a well-studied subclass of SCMs that generalize causal Bayesian networks to allow for latent confounders. In this paper, we investigate SCMs in a more general setting, allowing for the presence of both latent confounders and cycles. We show that in the presence of cycles, many of the convenient properties of acyclic SCMs do not hold in general: they do not always have a solution; they do not always induce unique observational, interventional and counterfactual distributions; a marginalization does not always exist, and if it exists the marginal model does not always respect the latent projection; they do not always satisfy a Markov property; and their graphs are not always consistent with their causal semantics. We prove that for SCMs in general each of these properties does hold under certain solvability conditions. Our work generalizes results for SCMs with cycles that were only known for certain special cases so far. We introduce the class of simple SCMs that extends the class of acyclic SCMs to the cyclic setting, while preserving many of the convenient properties of acyclic SCMs. With this paper, we aim to provide the foundations for a general theory of statistical causal modeling with SCMs.

MSC:

62H22 Probabilistic graphical models
62A09 Graphical methods in statistics
68T30 Knowledge representation
68T37 Reasoning under uncertainty in the context of artificial intelligence

References:

[1] Balke, A. and Pearl, J. (1994). Probabilistic evaluation of counterfactual queries. In Proceedings of the Twelfth National Conference on Artificial Intelligence (AAAI-94) 1 230-237. AAAI Press, Menlo Park.
[2] Beckers, S. and Halpern, J. Y. (2019). Abstracting causal models. In Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence (AAAI-19) 33 2678-2685. AAAI Press, Menlo Park.
[3] Blom, T., Bongers, S. and Mooij, J. M. (2019). Beyond structural causal models: Causal constraints models. In Proceedings of the 35th Conference on Uncertainty in Artificial Intelligence (UAI-19) (R. P. Adams and V. Gogate, eds.). AUAI Press.
[4] Bollen, K. A. (1989). Structural Equations with Latent Variables. Wiley Series in Probability and Mathematical Statistics: Applied Probability and Statistics. Wiley, New York. · Zbl 0731.62159 · doi:10.1002/9781118619179
[5] Bongers, S., Blom, T. and Mooij, J. M. (2021). Causal modeling of dynamical systems. Preprint. Available at arXiv:1803.08784v3 [cs.AI].
[6] Bongers, S., Forré, P., Peters, J. and Mooij, J. M. (2021). Supplement to “Foundations of structural causal models with cycles and latent variables.” https://doi.org/10.1214/21-AOS2064SUPP
[7] Bühlmann, P., Peters, J. and Ernest, J. (2014). CAM: Causal additive models, high-dimensional order search and penalized regression. Ann. Statist. 42 2526-2556. · Zbl 1309.62063 · doi:10.1214/14-AOS1260
[8] Byrne, R. M. J. (2007). The Rational Imagination: How People Create Alternatives to Reality. A Bradford Book. MIT Press, Cambridge, MA.
[9] Cooper, G. F. (1997). A simple constraint-based algorithm for efficiently mining observational databases for causal relationships. Data Min. Knowl. Discov. 1 203-224.
[10] Dawid, A. P. (2002). Influence diagrams for causal modelling and inference. Int. Stat. Rev. 70 161-189. · Zbl 1215.62002
[11] Duncan, O. D. (1975). Introduction to Structural Equation Models: Studies in Population. Academic Press, New York. · Zbl 0337.90019
[12] Eaton, D. and Murphy, K. (2007). Exact Bayesian structure learning from uncertain interventions. In Proceedings of the Eleventh International Conference on Artificial Intelligence and Statistics (M. Meila and X. Shen, eds.). Proceedings of Machine Learning Research 2 107-114.
[13] Eberhardt, F., Hoyer, P. and Scheines, R. (2010). Combining experiments to discover linear cyclic models with latent variables. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics (Y. W. Teh and M. Titterington, eds.). Proceedings of Machine Learning Research 9 185-192.
[14] Evans, R. J. (2016). Graphs for margins of Bayesian networks. Scand. J. Stat. 43 625-648. · Zbl 1468.62300 · doi:10.1111/sjos.12194
[15] Evans, R. J. (2018). Margins of discrete Bayesian networks. Ann. Statist. 46 2623-2656. · Zbl 1408.62044 · doi:10.1214/17-AOS1631
[16] Fisher, F. M. (1970). A correspondence principle for simultaneous equation models. Econometrica 38 73-92.
[17] Forré, P. and Mooij, J. M. (2017). Markov properties for graphical models with cycles and latent variables. Preprint. Available at arXiv:1710.08775 [math.ST].
[18] Forré, P. and Mooij, J. M. (2018). Constraint-based causal discovery for non-linear structural causal models with cycles and latent confounders. In Proceedings of the 34th Conference on Uncertainty in Artificial Intelligence (UAI-18) (A. Globerson and R. Silva, eds.). AUAI Press.
[19] Forré, P. and Mooij, J. M. (2019). Causal calculus in the presence of cycles, latent confounders and selection bias. In Proceedings of the 35th Conference on Uncertainty in Artificial Intelligence (UAI-19) (R. P. Adams and V. Gogate, eds.). AUAI Press.
[20] Foygel, R., Draisma, J. and Drton, M. (2012). Half-trek criterion for generic identifiability of linear structural equation models. Ann. Statist. 40 1682-1713. · Zbl 1257.62059 · doi:10.1214/12-AOS1012
[21] Goldberger, A. S. and Duncan, O. D. (1973). Structural Equation Models in the Social Sciences. Seminar Press, New York.
[22] Haavelmo, T. (1943). The statistical implications of a system of simultaneous equations. Econometrica 11 1-12. · Zbl 0063.01836 · doi:10.2307/1905714
[23] Halpern, J. (1998). Axiomatizing causal reasoning. In Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence (UAI-98) (G. Cooper and S. Moral, eds.) 202-210. Morgan Kaufmann, San Francisco, CA, USA.
[24] Hyttinen, A., Eberhardt, F. and Hoyer, P. O. (2012). Learning linear cyclic causal models with latent variables. J. Mach. Learn. Res. 13 3387-3439. · Zbl 1433.68350
[25] Hyttinen, A., Hoyer, P. O., Eberhardt, F. and Järvisalo, M. (2013). Discovering cyclic causal models with latent variables: A general SAT-based procedure. In Proceedings of the Twenty-Ninth Conference on Uncertainty in Artificial Intelligence (UAI-13) (A. Nicholson and P. Smyth, eds.) 301-310. AUAI Press, Corvallis, OR, USA.
[26] Iwasaki, Y. and Simon, H. A. (1994). Causality and model abstraction. Artificial Intelligence 67 143-194. · Zbl 0942.68711 · doi:10.1016/0004-3702(94)90014-0
[27] Kechris, A. S. (1995). Classical Descriptive Set Theory. Graduate Texts in Mathematics 156. Springer, New York. · Zbl 0819.04002 · doi:10.1007/978-1-4612-4190-4
[28] Koster, J. T. A. (1996). Markov properties of nonrecursive causal models. Ann. Statist. 24 2148-2177. · Zbl 0867.62056 · doi:10.1214/aos/1069362315
[29] Koster, J. T. A. (1999). On the validity of the Markov interpretation of path diagrams of Gaussian structural equations systems with correlated errors. Scand. J. Stat. 26 413-431. · Zbl 0947.60057 · doi:10.1111/1467-9469.00157
[30] Lacerda, G., Spirtes, P. L., Ramsey, J. and Hoyer, P. O. (2008). Discovering cyclic causal models by independent components analysis. In Proceedings of the Twenty-Fourth Conference on Uncertainty in Artificial Intelligence (UAI-08) (D. McAllester and P. Myllymaki, eds.) 366-374. AUAI Press, Corvallis, OR, USA.
[31] Lauritzen, S. L. (1996). Graphical Models. Oxford Statistical Science Series 17. Clarendon Press, Oxford. · Zbl 0907.62001
[32] Lauritzen, S. L., Dawid, A. P., Larsen, B. N. and Leimer, H. G. (1990). Independence properties of directed Markov fields. Networks 20 491-505. · Zbl 0743.05065 · doi:10.1002/net.3230200503
[33] Lewis, D. K. (1979). Counterfactual dependence and time’s arrow. Noûs 13 455-476. · doi:10.2307/2215339
[34] Maathuis, M. H., Kalisch, M. and Bühlmann, P. (2009). Estimating high-dimensional intervention effects from observational data. Ann. Statist. 37 3133-3164. · Zbl 1191.62118 · doi:10.1214/09-AOS685
[35] Mani, S. (2006). A Bayesian local causal discovery framework. PhD thesis, Univ. Pittsburg.
[36] Mason, S. J. (1953). Feedback theory—Some properties of signal flow graphs. In Proceedings of the IRE 41 1144-1156. IEEE.
[37] Mason, S. J. (1956). Feedback theory—Further properties of signal flow graphs. In Proceedings of the IRE 44 920-926. IEEE.
[38] Meek, C. (1995). Strong completeness and faithfulness in Bayesian networks. In Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence (UAI-95) (P. Besnard and S. Hanks, eds.) 411-418. Morgan Kaufmann, San Francisco, CA, USA.
[39] Mogensen, S. W. and Hansen, N. R. (2020). Markov equivalence of marginalized local independence graphs. Ann. Statist. 48 539-559. · Zbl 1441.62221 · doi:10.1214/19-AOS1821
[40] Mogensen, S. W., Malinsky, D. and Hansen, N. R. (2018). Causal learning for partially observed stochastic dynamical systems. In Proceedings of the Thirty-Fourth Conference on Uncertainty in Artificial Intelligence (UAI-18) (A. Globerson and R. Silva, eds.) AUAI Press.
[41] Mooij, J. M. and Claassen, T. (2020). Constraint-based causal discovery using partial ancestral graphs in the presence of cycles. In Proceedings of the 36th Conference on Uncertainty in Artificial Intelligence (UAI-20) (J. Peters and D. Sontag, eds.) 124 1159-1168. PMLR.
[42] Mooij, J. M. and Heskes, T. (2013). Cyclic causal discovery from continuous equilibrium data. In Proceedings of the 29th Conference on Uncertainty in Artificial Intelligence (UAI-13) (A. Nicholson and P. Smyth, eds.) 431-439. AUAI Press, Corvallis, OR, USA.
[43] Mooij, J. M., Janzing, D. and Schölkopf, B. (2013). From ordinary differential equations to structural causal models: The deterministic case. In Proceedings of the 29th Conference on Uncertainty in Artificial Intelligence (UAI-13) (A. Nicholson and P. Smyth, eds.) 440-448. AUAI Press.
[44] Mooij, J. M., Magliacane, S. and Claassen, T. (2020). Joint causal inference from multiple contexts. J. Mach. Learn. Res. 21 Paper No. 99, 108. · Zbl 1507.62224
[45] Mooij, J. M., Peters, J., Janzing, D., Zscheischler, J. and Schölkopf, B. (2016). Distinguishing cause from effect using observational data: Methods and benchmarks. J. Mach. Learn. Res. 17 Paper No. 32, 102. · Zbl 1360.68700
[46] Neal, R. M. (2000). On deducing conditional independence from \(d\)-separation in causal graphs with feedback. J. Artificial Intelligence Res. 12 87-91. · Zbl 0943.68123 · doi:10.1613/jair.689
[47] Pearl, J. (1985). A constraint propagation approach to probabilistic reasoning. In Proceedings of the First Conference on Uncertainty in Artificial Intelligence (UAI-85) (L. Kanal and J. Lemmer, eds.) 31-42. AUAI Press, Corvallis, OR, USA.
[48] Pearl, J. (2009). Causality: Models, Reasoning, and Inference, 2nd ed. Cambridge Univ. Press, Cambridge. · Zbl 1188.68291 · doi:10.1017/CBO9780511803161
[49] Pearl, J. andDechter, R. (1996). Identifying independence in causal graphs with feedback. In Proceedings of the Twelfth Conference on Uncertainty in Artificial Intelligence (UAI-96) (E. Horvitz and F. Jensen, eds.) 420-426. Morgan Kaufmann, San Francisco, CA, USA.
[50] Pearl, J. and Mackenzie, D. (2018). The Book of Why: The New Science of Cause and Effect. Basic Books, New York. · Zbl 1416.62026
[51] Peters, J., Janzing, D. and Schölkopf, B. (2017). Elements of Causal Inference: Foundations and Learning Algorithms. Adaptive Computation and Machine Learning. MIT Press, Cambridge, MA. · Zbl 1416.62012
[52] Peters, J., Mooij, J. M., Janzing, D. and Schölkopf, B. (2014). Causal discovery with continuous additive noise models. J. Mach. Learn. Res. 15 2009-2053. · Zbl 1318.68151
[53] Pfister, N., Bauer, S. and Peters, J. (2019). Learning stable and predictive structures in kinetic systems. Proc. Natl. Acad. Sci. USA 116 25405-25411. · Zbl 1456.70002 · doi:10.1073/pnas.1905688116
[54] Richardson, T. (2003). Markov properties for acyclic directed mixed graphs. Scand. J. Stat. 30 145-157. · Zbl 1035.60005 · doi:10.1111/1467-9469.00323
[55] Richardson, T. and Spirtes, P. (1999). Automated discovery of linear feedback models. In Computation, Causation, and Discovery (C. Glymour and G. F. Cooper, eds.) 253-302. AAAI Press, Menlo Park, CA.
[56] Richardson, T. and Spirtes, P. (2002). Ancestral graph Markov models. Ann. Statist. 30 962-1030. · Zbl 1033.60008 · doi:10.1214/aos/1031689015
[57] Richardson, T. S. (1996). A discovery algorithm for directed cyclic graphs. In Proceedings of the Twelfth Conference on Uncertainty in Artificial Intelligence (UAI-96) (E. Horvitz and F. Jensen, eds.) 454-461. Morgan Kaufmann, San Francisco, CA, USA.
[58] Richardson, T. S. (1996). Discovering cyclic causal structure. Technical Report No. CMU-PHIL-68, Carnegie Mellon Univ.
[59] Richardson, T. S. (1996). Models of feedback: Interpretation and discovery. Ph.D. thesis, Carnegie Mellon Univ.
[60] Richardson, T. S. and Robins, J. (2013). Single world intervention graphs (SWIGs): A unification of the counterfactual and graphical approaches to causality. Technical Report No. 128, Center for Statistics and the Social Sciences.
[61] Roese, N. J. (1997). Counterfactual thinking. Psychol. Bull. 121 133-148.
[62] Rubenstein, P. K., Weichwald, S., Bongers, S., Mooij, J. M., Janzing, D., Grosse-Wentrup, M. and Schölkopf, B. (2017). Causal consistency of structural equation models. In Proceedings of the 33rd Conference on Uncertainty in Artificial Intelligence (UAI-17) (G. Elidan and K. Kersting, eds.). AUAI Press.
[63] Rubin, D. B. (1974). Estimating causal effects of treatments in randomized and nonrandomized studies. J. Educ. Psychol. 66 688-701.
[64] Shpitser, I. and Pearl, J. (2008). Complete identification methods for the causal hierarchy. J. Mach. Learn. Res. 9 1941-1979. · Zbl 1225.68216
[65] Spirtes, P. (1993). Directed cyclic graphs, conditional independence, and non-recursive linear structural equation models. Technical Report No. CMU-PHIL-35, Carnegie Mellon Univ.
[66] Spirtes, P. (1994). Conditional independence in directed cyclic graphical models for feedback. Technical Report No. CMU-PHIL-54, Carnegie Mellon Univ.
[67] Spirtes, P. (1995). Directed cyclic graphical representations of feedback models. In Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence (UAI-95) (P. Besnard and S. Hanks, eds.) 499-506. Morgan Kaufmann, San Francisco, CA, USA.
[68] Spirtes, P., Glymour, C. and Scheines, R. (2000). Causation, Prediction, and Search, 2nd ed. Adaptive Computation and Machine Learning. MIT Press, Cambridge, MA. · Zbl 0806.62001
[69] Spirtes, P., Meek, C. and Richardson, T. (1999). An algorithm for causal inference in the presence of latent variables and selection bias. In Computation, Causation, and Discovery (C. Glymour and G. F. Cooper, eds.) 211-252. AAAI Press, Menlo Park, CA.
[70] Spirtes, P., Richardson, T., Meek, C., Scheines, R. and Glymour, C. (1998). Using path diagrams as a structural equation modelling tool. Sociol. Methods Res. 27 182-225.
[71] Tian, J. (2002). Studies in causal reasoning and learning. Technical Report No. R-309, Cognitive Systems Laboratory, Univ. California, Los Angeles, USA.
[72] Tian, J. and Pearl, J. (2001). Causal discovery from changes. In Proceedings of the 17th Conference in Uncertainty in Artificial Intelligence (UAI-01) (J. Breese and D. Koller, eds.) 512-521. Morgan Kaufmann, San Francisco, CA, USA.
[73] Verma, T. S. (1993). Graphical aspects of causal models. Technical Report No. R-191. Computer Science Department, Univ. California, Los Angeles, USA.
[74] Wright, S. (1921). Correlation and causation. J. Agric. Res. 20 557-585.
[75] Zhang, J. (2008). On the completeness of orientation rules for causal discovery in the presence of latent confounders and selection bias. Artificial Intelligence 172 1873-1896 · Zbl 1184.68434 · doi:10.1016/j.artint.2008.08.001
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.