×

Bayesian causal inference in probit graphical models. (English) Zbl 07808148

Summary: We consider a binary response which is potentially affected by a set of continuous variables. Of special interest is the causal effect on the response due to an intervention on a specific variable. The latter can be meaningfully determined on the basis of observational data through suitable assumptions on the data generating mechanism. In particular we assume that the joint distribution obeys the conditional independencies (Markov properties) inherent in a Directed Acyclic Graph (DAG), and the DAG is given a causal interpretation through the notion of interventional distribution. We propose a DAG-probit model where the response is generated by discretization through a random threshold of a continuous latent variable and the latter, jointly with the remaining continuous variables, has a distribution belonging to a zero-mean Gaussian model whose covariance matrix is constrained to satisfy the Markov properties of the DAG; the latter is assigned a DAG-Wishart prior through the corresponding Cholesky parameters. Our model leads to a natural definition of causal effect conditionally on a given DAG. Since the DAG which generates the observations is unknown, we present an efficient MCMC algorithm whose target is the posterior distribution on the space of DAGs, the Cholesky parameters of the concentration matrix, and the threshold linking the response to the latent. Our end result is a Bayesian Model Averaging estimate of the causal effect which incorporates parameter, as well as model, uncertainty. The methodology is assessed using simulation experiments and applied to a gene expression data set originating from breast cancer stem cells.

MSC:

62-XX Statistics

Software:

HdBCS; TETRAD

References:

[1] Abdul, Q., Yu, B., Chung, H., Jung, H., and J. S., C. (2017). “Epigenetic modifications of gene expression by lifestyle and environment.” Archives of Pharmacal Research, 40: 1219-1237. Digital Object Identifier: 10.1007/s12272-017-0973-3 Google Scholar: Lookup Link · doi:10.1007/s12272-017-0973-3
[2] Albert, J. H. and Chib, S. (1993). “Bayesian analysis of binary and polychotomous response data.” Journal of the American Statistical Association, 88(422): 669-679. URL http://www.jstor.org/stable/2290350 MathSciNet: MR1224394 · Zbl 0774.62031
[3] Andersson, S. A., Madigan, D., and Perlman, M. D. (1997). “A characterization of Markov equivalence classes for acyclic digraphs.” The Annals of Statistics, 25(2): 505-541. Digital Object Identifier: 10.1214/aos/1031833662 Google Scholar: Lookup Link MathSciNet: MR1439312 zbMATH: 0876.60095 · Zbl 0876.60095 · doi:10.1214/aos/1031833662
[4] Barbieri, M. M. and Berger, J. O. (2004). “Optimal predictive model selection.” The Annals of Statistics, 32: 870-897. Digital Object Identifier: 10.1214/009053604000000238 Google Scholar: Lookup Link Euclid: euclid.aos/1085408489 zbMATH: 1092.62033 MathSciNet: MR2065192 · Zbl 1092.62033 · doi:10.1214/009053604000000238
[5] Ben-David, E., Li, T., Massam, H., and Rajaratnam, B. (2015). “High dimensional Bayesian inference for Gaussian directed acyclic graph models.” arXiv preprint. arXiv:1109.4371
[6] Campbell, K. L., Landells, C. E., Fan, J., and Brenner, D. R. (2017). “A systematic review of the effect of lifestyle interventions on adipose tissue gene expression: Implications for carcinogenesis.” Obesity, 25(S2): S40-S51. URL https://onlinelibrary.wiley.com/doi/abs/10.1002/oby.22010
[7] Cao, X., Khare, K., and Ghosh, M. (2019). “Posterior graph selection and estimation consistency for high-dimensional Bayesian DAG models.” The Annals of Statistics, 47(1): 319-348. Digital Object Identifier: 10.1214/18-AOS1689 Google Scholar: Lookup Link MathSciNet: MR3909935 Euclid: euclid.aos/1543568590 zbMATH: 1417.62140 · Zbl 1417.62140 · doi:10.1214/18-AOS1689
[8] Castelletti, F. and Consonni, G. (2021a). “Bayesian inference of causal effects from observational data in Gaussian graphical models.” Biometrics, 77(1): 136-149. Digital Object Identifier: 10.1111/biom.13281 Google Scholar: Lookup Link MathSciNet: MR4229727 zbMATH: 1520.62149 · Zbl 1520.62149 · doi:10.1111/biom.13281
[9] Castelletti, F. and Consonni, G. (2021b). “Supplementary Material of “Bayesian Causal Inference in Probit Graphical Models”.” Bayesian Analysis. Digital Object Identifier: 10.1214/21-BA1260SUPP Google Scholar: Lookup Link zbMATH: 1520.62149 · doi:10.1214/21-BA1260SUPP
[10] Dobra, A., Hans, C., Jones, B., Nevins, J. R., Yao, G., and West, M. (2004). “Sparse graphical models for exploring gene expression data.” Journal of Multivariate Analysis, 90(1): 196-212. URL http://www.sciencedirect.com/science/article/pii/S0047259X04000259 Digital Object Identifier: 10.1016/j.jmva.2004.02.009 Google Scholar: Lookup Link MathSciNet: MR2064941 · Zbl 1047.62104 · doi:10.1016/j.jmva.2004.02.009
[11] Friedman, N. (2004). “Inferring cellular networks using probabilistic graphical models.” Science, 303(5659): 799-805. URL https://science.sciencemag.org/content/303/5659/799
[12] Friedman, N. and Koller, D. (2003). “Being Bayesian about network structure. A Bayesian approach to structure discovery in Bayesian networks.” Machine Learning, 50(1-2): 95-125. URL https://doi.org/10.1023/A:1020249912095 zbMATH: 1033.68104 · Zbl 1033.68104
[13] García-Donato, G. and Martínez-Beneito, M. A. (2013). “On sampling strategies in Bayesian variable selection problems with large model spaces.” Journal of the American Statistical Association, 108(501): 340-352. Digital Object Identifier: 10.1080/01621459.2012.742443 Google Scholar: Lookup Link MathSciNet: MR3174624 · Zbl 06158347 · doi:10.1080/01621459.2012.742443
[14] Geiger, D. and Heckerman, D. (2002). “Parameter priors for directed acyclic graphical models and the characterization of several probability distributions.” The Annals of Statistics, 30(5): 1412-1440. Digital Object Identifier: 10.1214/aos/1035844981 Google Scholar: Lookup Link MathSciNet: MR1936324 zbMATH: 1016.62064 · Zbl 1016.62064 · doi:10.1214/aos/1035844981
[15] Godsill, S. J. (2012). “On the relationship between Markov chain Monte Carlo methods for model uncertainty.” Journal of Computational and Graphical Statistics, 10(2): 230-248. Digital Object Identifier: 10.1198/10618600152627924 Google Scholar: Lookup Link MathSciNet: MR1939699 · doi:10.1198/10618600152627924
[16] Guo, J., Levina, E., Michailidis, G., and Zhu, J. (2015). “Graphical models for ordinal data.” Journal of Computational and Graphical Statistics, 24(1): 183-204. Digital Object Identifier: 10.1080/10618600.2014.889023 Google Scholar: Lookup Link MathSciNet: MR3328253 · doi:10.1080/10618600.2014.889023
[17] Lauritzen, S. L. (1996). Graphical models. Oxford University Press. MathSciNet: MR1419991 zbMATH: 0907.62001 · Zbl 0907.62001
[18] Maathuis, M. and Nandy, P. (2016). “A review of some recent advances in causal inference.” In Bühlmann, P., Drineas, P., Kane, M., and van der Laan, M. (eds.), Handbook of Big Data, 387-408. Chapman and Hall/CRC. MathSciNet: MR3674827
[19] Maathuis, M. H., Kalisch, M., and Bühlmann, P. (2009). “Estimating high-dimensional intervention effects from observational data.” The Annals of Statistics, 37(6A): 3133-3164. Digital Object Identifier: 10.1214/09-AOS685 Google Scholar: Lookup Link MathSciNet: MR2549555 zbMATH: 1191.62118 · Zbl 1191.62118 · doi:10.1214/09-AOS685
[20] Markowetz, F. and Spang, R. (2007). “Inferring cellular networks – a review.” BMC Bioinformatics, 8(S5). Digital Object Identifier: 10.1186/1471-2105-8-S6-S5 Google Scholar: Lookup Link · doi:10.1186/1471-2105-8-S6-S5
[21] Nandy, P., Maathuis, M. H., and Richardson, T. S. (2017). “Estimating the effect of joint interventions from observational data in sparse high-dimensional settings.” Ann. Statist., 45(2): 647-674. Digital Object Identifier: 10.1214/16-AOS1462 Google Scholar: Lookup Link MathSciNet: MR3650396 zbMATH: 1426.62286 · Zbl 1426.62286 · doi:10.1214/16-AOS1462
[22] O’Brien, C. A., Pollett, A., Gallinger, S., and Dick, J. E. (2006). “A human colon cancer cell capable of initiating tumour growth in immunodeficient mice.” Nature, 445: 106-110. URL https://doi.org/10.1038/nature05372
[23] Pearl, J. (1995). “Causal diagrams for empirical research.” Biometrika, 82(4): 669-688. URL http://www.jstor.org/stable/2337329 Digital Object Identifier: 10.1093/biomet/82.4.669 Google Scholar: Lookup Link MathSciNet: MR1380809 · Zbl 0860.62045 · doi:10.1093/biomet/82.4.669
[24] Pearl, J. (2000). Causality: Models, Reasoning, and Inference. Cambridge University Press, Cambridge. MathSciNet: MR1744773 zbMATH: 0959.68116 zbMATH: 1188.68291 · Zbl 0959.68116
[25] Pearl, J. (2009). “Causal inference in statistics: An overview.” Statistics Surveys, 3: 96-146. Digital Object Identifier: 10.1214/09-SS057 Google Scholar: Lookup Link MathSciNet: MR2545291 zbMATH: 1300.62013 · Zbl 1300.62013 · doi:10.1214/09-SS057
[26] Peters, J. and Bühlmann, P. (2014). “Identifiability of Gaussian structural equation models with equal error variances.” Biometrika, 101(1): 219-228. URL http://www.jstor.org/stable/43305605 Digital Object Identifier: 10.1093/biomet/ast043 Google Scholar: Lookup Link MathSciNet: MR3180667 · Zbl 1285.62005 · doi:10.1093/biomet/ast043
[27] Peterson, C., Stingo, F. C., and Vannucci, M. (2015). “Bayesian Inference of Multiple Gaussian Graphical Models.” Journal of the American Statistical Association, 110(509): 159-174. PMID: 26078481. Digital Object Identifier: 10.1080/01621459.2014.896806 Google Scholar: Lookup Link MathSciNet: MR3338494 zbMATH: 1373.62106 · Zbl 1373.62106 · doi:10.1080/01621459.2014.896806
[28] Sadeghi, K. (2017). “Faithfulness of probability distributions and graphs.” Journal of Machine Learning Research, 18(148): 1-29. URL http://jmlr.org/papers/v18/17-275.html MathSciNet: MR3763782 · Zbl 1444.62079
[29] Spirtes, P., Glymour, C., and Scheines, R. (2000). Causation, Prediction and Search (2nd edition). Cambridge, MA: The MIT Press. MathSciNet: MR1815675 zbMATH: 0806.62001
[30] Verma, T. and Pearl, J. (1990). “Equivalence and Synthesis of Causal Models.” In Proceedings of the Sixth Annual Conference on Uncertainty in Artificial Intelligence, UAI 90, 255-270. New York, NY, USA: Elsevier Science Inc. zbMATH: 07672243
[31] Wang, H. and Li, S. Z. (2012). “Efficient Gaussian graphical model determination under G-Wishart prior distributions.” Electronic Journal of Statistics, 6: 168-198. Digital Object Identifier: 10.1214/12-EJS669 Google Scholar: Lookup Link MathSciNet: MR2879676 Euclid: euclid.ejs/1328280902 zbMATH: 06166954 zbMATH: 1335.62069 · Zbl 1335.62069 · doi:10.1214/12-EJS669
[32] Waugh, D. J. and Wilson, C. (2008). “The interleukin-8 pathway in cancer.” Clinical Cancer Research, 14(21): 6735-6741. Digital Object Identifier: 10.1158/1078-0432.CCR-07-4843 Google Scholar: Lookup Link · doi:10.1158/1078-0432.CCR-07-4843
[33] Xie, Y., Liu, Y., and Valdar, W. (2017). “Joint estimation of multiple dependent Gaussian graphical models with applications to mouse genomics.” Biometrika, 103(3): 493-511. Digital Object Identifier: 10.1093/biomet/asw035 Google Scholar: Lookup Link MathSciNet: MR3551780 zbMATH: 1506.62323 · Zbl 1506.62323 · doi:10.1093/biomet/asw035
[34] Yin, Z.-Q., Liu, J.-J., Xu, Y.-C., Yu, J., Ding, G.-H., Yang, F., Tang, L., Liu, B.-H., Ma, Y., Xia, Y.-W., Lin, X.-L., and Wang, H.-X. (2014). “A 41-gene signature derived from breast cancer stem cells as a predictor of survival.” Journal of Experimental & Clinical Cancer Research, 33(49). Digital Object Identifier: 10.1186/1756-9966-33-49 Google Scholar: Lookup Link · doi:10.1186/1756-9966-33-49
[35] Zhang, J. and Spirtes, P. (2003). “Strong Faithfulness and Uniform Consistency in Causal Inference.” In Meek, C. and Kjærulff, U. (eds.), UAI ’03, Proceedings of the 19th Conference in Uncertainty in Artificial Intelligence, Acapulco, Mexico, August 7-10 2003, 632-639. Morgan Kaufmann. URL https://dslpitt.org/uai/displayArticleDetails.jsp?mmnu=1&smnu=2&article_id=983&proceeding_id=19
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.