×

Bayesian sample size determination for causal discovery. (English) Zbl 07886120

Summary: Graphical models based on Directed Acyclic Graphs (DAGs) are widely used to answer causal questions across a variety of scientific and social disciplines. However, observational data alone cannot distinguish in general between DAGs representing the same conditional independence assertions (Markov equivalent DAGs); as a consequence, the orientation of some edges in the graph remains indeterminate. Interventional data, produced by exogenous manipulations of variables in the network, enhance the process of structure learning because they allow to distinguish among equivalent DAGs, thus sharpening causal inference. Starting from an equivalence class of DAGs, a few procedures have been devised to produce a collection of variables to be manipulated in order to identify a causal DAG. Yet, these algorithmic approaches do not determine the sample size of the interventional data required to obtain a desired level of statistical accuracy. We tackle this problem from a Bayesian experimental design perspective, taking as input a sequence of target variables to be manipulated to identify edge orientation. We then propose a method to determine, at each intervention, the optimal sample size to produce an experiment which, with high assurance, will deliver an overall probability of decisive and correct evidence.

MSC:

62-XX Statistics

Software:

R; TETRAD

References:

[1] ADCOCK, C. J. (1997). Sample size determination: A review. J. R. Stat. Soc., Ser. D, Stat. 46 261-283.
[2] ANDERSSON, S. A., MADIGAN, D. and PERLMAN, M. D. (1997). A characterization of Markov equivalence classes for acyclic digraphs. Ann. Statist. 25 505-541. Digital Object Identifier: 10.1214/aos/1031833662 Google Scholar: Lookup Link MathSciNet: MR1439312 · Zbl 0876.60095 · doi:10.1214/aos/1031833662
[3] ANDERSSON, S. A., MADIGAN, D. and PERLMAN, M. D. (2001). Alternative Markov properties for chain graphs. Scand. J. Stat. 28 33-85. Digital Object Identifier: 10.1111/1467-9469.00224 Google Scholar: Lookup Link MathSciNet: MR1844349 · Zbl 0972.60067 · doi:10.1111/1467-9469.00224
[4] BANDYOPADHYAY, P. S. and FORSTER, M. R., eds. (2011). Posterior model probabilities. In Philosophy of Statistics. Handbook of the Philosophy of Science 7. Elsevier/North-Holland, Amsterdam. Digital Object Identifier: 10.1016/B978-0-444-51862-0.50001-0 Google Scholar: Lookup Link MathSciNet: MR3295937 · Zbl 1230.62002 · doi:10.1016/B978-0-444-51862-0.50001-0
[5] CASTELLETTI, F. and CONSONNI, G. (2019). Objective Bayes model selection of Gaussian interventional essential graphs for the identification of signaling pathways. Ann. Appl. Stat. 13 2289-2311. Digital Object Identifier: 10.1214/19-aoas1275 Google Scholar: Lookup Link MathSciNet: MR4037431 · Zbl 1435.62435 · doi:10.1214/19-aoas1275
[6] CASTELLETTI, F. and CONSONNI, G. (2020). Discovering causal structures in Bayesian Gaussian directed acyclic graph models. J. Roy. Statist. Soc. Ser. A 183 1727-1745. MathSciNet: MR4157833
[7] CASTELLETTI, F. and CONSONNI, G. (2021). Bayesian inference of causal effects from observational data in Gaussian graphical models. Biometrics 77 136-149. Digital Object Identifier: 10.1111/biom.13281 Google Scholar: Lookup Link MathSciNet: MR4229727 · Zbl 1520.62149 · doi:10.1111/biom.13281
[8] CASTELLETTI, F., CONSONNI, G., DELLA VEDOVA, M. L. and PELUSO, S. (2018). Learning Markov equivalence classes of directed acyclic graphs: An objective Bayes approach. Bayesian Anal. 13 1231-1256. Digital Object Identifier: 10.1214/18-BA1101 Google Scholar: Lookup Link MathSciNet: MR3855370 · Zbl 1407.62189 · doi:10.1214/18-BA1101
[9] CASTELLETTI, F. and PELUSO, S. (2021). Equivalence class selection of categorical graphical models. Comput. Statist. Data Anal. 164 Paper No. 107304. Digital Object Identifier: 10.1016/j.csda.2021.107304 Google Scholar: Lookup Link MathSciNet: MR4280200 · Zbl 1543.62411 · doi:10.1016/j.csda.2021.107304
[10] CASTELLETTI, F. and PELUSO, S. (2023). Bayesian learning of network structures from interventional experimental data. Biometrika asad032.
[11] CASTELO, R. and PERLMAN, M. D. (2004). Learning essential graph Markov models from data. In Advances in Bayesian Networks. Stud. Fuzziness Soft Comput. 146 255-269. Springer, Berlin. Digital Object Identifier: 10.1007/978-3-540-39879-0_14 Google Scholar: Lookup Link MathSciNet: MR2090887 · doi:10.1007/978-3-540-39879-0_14
[12] CHALONER, K. and VERDINELLI, I. (1995). Bayesian experimental design: A review. Statist. Sci. 10 273-304. MathSciNet: MR1390519 · Zbl 0955.62617
[13] CHICKERING, D. M. (1995). A transformational characterization of equivalent Bayesian network structures. In Uncertainty in Artificial Intelligence (Montreal, PQ, 1995) 87-98. Morgan Kaufmann, San Francisco, CA. MathSciNet: MR1615012
[14] CHICKERING, D. M. (2002). Learning equivalence classes of Bayesian-network structures. J. Mach. Learn. Res. 2 445-498. Digital Object Identifier: 10.1162/153244302760200696 Google Scholar: Lookup Link MathSciNet: MR1929415 · Zbl 1007.68179 · doi:10.1162/153244302760200696
[15] CONSONNI, G. and LA ROCCA, L. (2012). Objective Bayes factors for Gaussian directed acyclic graphical models. Scand. J. Stat. 39 743-756. Digital Object Identifier: 10.1111/j.1467-9469.2011.00785.x Google Scholar: Lookup Link MathSciNet: MR3000846 · Zbl 1253.62015 · doi:10.1111/j.1467-9469.2011.00785.x
[16] CONSONNI, G. and VERONESE, P. (2008). Compatibility of prior specifications across linear models. Statist. Sci. 23 332-353. Digital Object Identifier: 10.1214/08-STS258 Google Scholar: Lookup Link MathSciNet: MR2483907 · Zbl 1329.62331 · doi:10.1214/08-STS258
[17] Cowell, R. G., Dawid, A. P., Lauritzen, S. L. and Spiegelhalter, D. J. (1999). Probabilistic Networks and Expert Systems. Statistics for Engineering and Information Science. Springer, New York. MathSciNet: MR1697175 · Zbl 0937.68121
[18] DASGUPTA, A. (1996). Review of optimal Bayes designs. In Design and Analysis of Experiments. Handbook of Statist. 13 1099-1147. North-Holland, Amsterdam. Digital Object Identifier: 10.1016/S0169-7161(96)13031-5 Google Scholar: Lookup Link MathSciNet: MR1492591 · Zbl 0897.00015 · doi:10.1016/S0169-7161(96)13031-5
[19] DAWID, A. P. (1992). Prequential analysis, stochastic complexity and Bayesian inference. In Bayesian Statistics, 4 (Peñíscola, 1991) 109-125. Oxford Univ. Press, New York. MathSciNet: MR1380273
[20] DAWID, A. P. (2010). Beware of the DAG!. In Proceedings of Workshop on Causality: Objectives and Assessment at NIPS 2008 (I. Guyon, D. Janzing and B. Schölkopf, eds.). Proceedings of Machine Learning Research 6 59-86. PMLR, Whistler, Canada.
[21] DAWID, A. P. and LAURITZEN, S. L. (1993). Hyper-Markov laws in the statistical analysis of decomposable graphical models. Ann. Statist. 21 1272-1317. Digital Object Identifier: 10.1214/aos/1176349260 Google Scholar: Lookup Link MathSciNet: MR1241267 · Zbl 0815.62038 · doi:10.1214/aos/1176349260
[22] DE SANTIS, F. (2004). Statistical evidence and sample size determination for Bayesian hypothesis testing. J. Statist. Plann. Inference 124 121-144. Digital Object Identifier: 10.1016/S0378-3758(03)00198-8 Google Scholar: Lookup Link MathSciNet: MR2066230 · Zbl 1094.62032 · doi:10.1016/S0378-3758(03)00198-8
[23] EBERHARDT, F. (2008). Almost optimal intervention sets for causal discovery. In Proceedings of the Twenty-Fourth Conference on Uncertainty in Artificial Intelligence. UAI’08 161-168. AUAI Press, Arlington, VA, USA.
[24] ETZIONI, R. and KADANE, J. B. (1993). Optimal experimental design for another’s analysis. J. Amer. Statist. Assoc. 88 1404-1411. MathSciNet: MR1245377 · Zbl 0792.62062
[25] FRIEDMAN, N. (2004). Inferring cellular networks using probabilistic graphical models. Science 303 799-805. Digital Object Identifier: 10.1126/science.1094068 Google Scholar: Lookup Link · doi:10.1126/science.1094068
[26] FROT, B., NANDY, P. and MAATHUIS, M. H. (2019). Robust causal structure learning with some hidden variables. J. R. Stat. Soc. Ser. B. Stat. Methodol. 81 459-487. Digital Object Identifier: 10.1111/rssb.12315 Google Scholar: Lookup Link MathSciNet: MR3961495 · Zbl 1420.62361 · doi:10.1111/rssb.12315
[27] GEIGER, D. and HECKERMAN, D. (2002). Parameter priors for directed acyclic graphical models and the characterization of several probability distributions. Ann. Statist. 30 1412-1440. Digital Object Identifier: 10.1214/aos/1035844981 Google Scholar: Lookup Link MathSciNet: MR1936324 · Zbl 1016.62064 · doi:10.1214/aos/1035844981
[28] HAO, W., SUO, F., LIN, Q., CHEN, Q., ZHOU, L., LIU, Z., CUI, W. and ZHOU, Z. (2020). Design and construction of portable CRISPR-Cpf1-mediated genome editing in bacillus subtilis 168 oriented toward multiple utilities. Front. Bioeng. Biotechnol. 8.
[29] HAUSER, A. and BÜHLMANN, P. (2012). Characterization and greedy learning of interventional Markov equivalence classes of directed acyclic graphs. J. Mach. Learn. Res. 13 2409-2464. MathSciNet: MR2973606 · Zbl 1433.68346
[30] HAUSER, A. and BÜHLMANN, P. (2014). Two optimal strategies for active learning of causal models from interventional data. Internat. J. Approx. Reason. 55 926-939. Digital Object Identifier: 10.1016/j.ijar.2013.11.007 Google Scholar: Lookup Link MathSciNet: MR3178409 · Zbl 1390.68530 · doi:10.1016/j.ijar.2013.11.007
[31] HAUSER, A. and BÜHLMANN, P. (2015). Jointly interventional and observational data: Estimation of interventional Markov equivalence classes of directed acyclic graphs. J. R. Stat. Soc. Ser. B. Stat. Methodol. 77 291-318. Digital Object Identifier: 10.1111/rssb.12071 Google Scholar: Lookup Link MathSciNet: MR3299409 · Zbl 1414.62021 · doi:10.1111/rssb.12071
[32] HE, Y.-B. and GENG, Z. (2008). Active learning of causal networks with intervention experiments and optimal designs. J. Mach. Learn. Res. 9 2523-2547. MathSciNet: MR2460892 · Zbl 1225.68184
[33] HYTTINEN, A., EBERHARDT, F. and HOYER, P. O. (2013). Experiment selection for causal discovery. J. Mach. Learn. Res. 14 3041-3071. MathSciNet: MR3138909 · Zbl 1318.62175
[34] IMBENS, G. W. (2020). Potential outcome and directed acyclic graph approaches to causality: Relevance for empirical practice in economics. J. Econ. Lit. 58 1129-1179.
[35] Jeffreys, H. (1961). Theory of Probability, 3rd ed. Clarendon Press, Oxford. MathSciNet: MR0187257 · Zbl 0116.34904
[36] JOHNSON, V. E. and ROSSELL, D. (2010). On the use of non-local prior densities in Bayesian hypothesis tests. J. R. Stat. Soc. Ser. B. Stat. Methodol. 72 143-170. Digital Object Identifier: 10.1111/j.1467-9868.2009.00730.x Google Scholar: Lookup Link MathSciNet: MR2830762 · Zbl 1411.62019 · doi:10.1111/j.1467-9868.2009.00730.x
[37] Kalisch, M. and Bühlmann, P. (2007). Estimating high-dimensional directed acyclic graphs with the PC-algorithm. J. Mach. Learn. Res. 8 613-636. · Zbl 1222.68229
[38] Kass, R. E. and Raftery, A. E. (1995). Bayes factors. J. Amer. Statist. Assoc. 90 773-795. Digital Object Identifier: 10.1080/01621459.1995.10476572 Google Scholar: Lookup Link MathSciNet: MR3363402 · Zbl 0846.62028 · doi:10.1080/01621459.1995.10476572
[39] Koller, D. and Friedman, N. (2009). Probabilistic Graphical Models: Principles and Techniques. Adaptive Computation and Machine Learning. MIT Press, Cambridge, MA. MathSciNet: MR2778120 · Zbl 1183.68483
[40] LAURITZEN, S. L. (1996). Graphical Models. Oxford Statistical Science Series 17. Oxford University Press, New York. MathSciNet: MR1419991 · Zbl 0907.62001
[41] LINDLEY, D. V. (1971). Bayesian Statistics, a Review. Conference Board of the Mathematical Sciences Regional Conference Series in Applied Mathematics, No. 2. SIAM, Philadelphia, PA. MathSciNet: MR0329081 · Zbl 0246.62009
[42] LINDLEY, D. V. (1997). The choice of sample size. J. R. Stat. Soc., Ser. D, Stat. 46 129-138.
[43] Maathuis, M. H., Kalisch, M. and Bühlmann, P. (2009). Estimating high-dimensional intervention effects from observational data. Ann. Statist. 37 3133-3164. Digital Object Identifier: 10.1214/09-AOS685 Google Scholar: Lookup Link MathSciNet: MR2549555 · Zbl 1191.62118 · doi:10.1214/09-AOS685
[44] MEGANCK, S., LERAY, P. and MANDERICK, B. (2006). Learning causal Bayesian networks from observations and experiments: A decision theoretic approach. In Modeling Decisions for Artificial Intelligence (V. Torra, Y. Narukawa, A. Valls and J. Domingo-Ferrer, eds.) 58-69. Springer Berlin Heidelberg, Berlin, Heidelberg. · Zbl 1235.68176
[45] Muirhead, R. J. (1982). Aspects of Multivariate Statistical Theory. Wiley Series in Probability and Mathematical Statistics. Wiley, New York. MathSciNet: MR0652932 MathSciNet: MR2984354 · Zbl 0556.62028
[46] NAGARAJAN, R., SCUTARI, M. and LÈBRE, S. (2013). Bayesian Networks in R with Applications in Systems Biology. Use R! Springer, New York. Digital Object Identifier: 10.1007/978-1-4614-6446-4 Google Scholar: Lookup Link MathSciNet: MR3059206 · Zbl 1272.62005 · doi:10.1007/978-1-4614-6446-4
[47] O’HAGAN, A. (1995). Fractional Bayes factors for model comparison. J. Roy. Statist. Soc. Ser. B 57 99-138. MathSciNet: MR1325379 · Zbl 0813.62026
[48] O’HAGAN, A. and STEVENS, J. W. (2001). Bayesian assessment of sample size for clinical trials of cost-effectiveness. Med. Decis. Mak. 21 219-230. Digital Object Identifier: 10.1177/0272989X0102100307 Google Scholar: Lookup Link · doi:10.1177/0272989X0102100307
[49] PAN, J. and BANERJEE, S. (2021). A unifying Bayesian approach for sample size determination using design and analysis priors. ArXiv preprint. Available at arXiv:2112.03509.
[50] Pearl, J. (2000). Causality: Models, reasoning, and inference. Cambridge Univ. Press, Cambridge. MathSciNet: MR1744773 · Zbl 0959.68116
[51] PEARL, J. (2003). Statistics and causal inference: A review. TEST 12 281-318. Digital Object Identifier: 10.1007/BF02595718 Google Scholar: Lookup Link MathSciNet: MR2044313 · Zbl 1044.62003 · doi:10.1007/BF02595718
[52] PENG, S., SHEN, X. and PAN, W. (2020). Reconstruction of a directed acyclic graph with intervention. Electron. J. Stat. 14 4133-4164. Digital Object Identifier: 10.1214/20-EJS1767 Google Scholar: Lookup Link MathSciNet: MR4175391 · Zbl 1455.62031 · doi:10.1214/20-EJS1767
[53] Peters, J. and Bühlmann, P. (2014). Identifiability of Gaussian structural equation models with equal error variances. Biometrika 101 219-228. Digital Object Identifier: 10.1093/biomet/ast043 Google Scholar: Lookup Link MathSciNet: MR3180667 · Zbl 1285.62005 · doi:10.1093/biomet/ast043
[54] PRESS, S. J. (1982). Applied Multivariate Analysis: Using Bayesian and Frequentist Methods of Inference. Krieger Publishing Company, Malabar, FL. · Zbl 0519.62041
[55] RAIFFA, H. and SCHLAIFER, R. (1961). Applied Statistical Decision Theory. Harvard Business School Publications. Division of Research, Graduate School of Business Adminitration, Harvard Univ. MathSciNet: MR0117844
[56] ROYALL, R. (2000). On the probability of observing misleading statistical evidence. J. Amer. Statist. Assoc. 95 760-780. Digital Object Identifier: 10.2307/2669456 Google Scholar: Lookup Link MathSciNet: MR1803877 · Zbl 1013.62002 · doi:10.2307/2669456
[57] Royall, R. M. (1997). Statistical Evidence: A Likelihood Paradigm. Monographs on Statistics and Applied Probability 71. CRC Press, London. MathSciNet: MR1629481
[58] SACHS, K., PEREZ, O., PE’ER, D., LAUFFENBURGER, D. A. and NOLAN, G. P. (2005). Causal protein-signaling networks derived from multiparameter single-cell data. Science 308 523-529.
[59] SCHÖNBRODT, F. D. and WAGENMAKERS, E. J. (2017). Bayes factor design analysis: Planning for compelling evidence. Psychon. Bull. Rev. 25 128-142.
[60] SHOJAIE, A. and MICHAILIDIS, G. (2009). Analysis of gene sets based on the underlying regulatory network. J. Comput. Biol. 16 407-426. Digital Object Identifier: 10.1089/cmb.2008.0081 Google Scholar: Lookup Link MathSciNet: MR2487566 · doi:10.1089/cmb.2008.0081
[61] SPIEGELHALTER, D. J., ABRAMS, K. R. and MYLES, J. P. (2003). Bayesian Approaches to Clinical Trials and Health-Care Evaluation. Wiley, New York.
[62] SPIEGELHALTER, D. J. and FREEDMAN, L. S. (1986). A predictive approach to selecting the size of a clinical trial, based on subjective clinical opinion. Stat. Med. 5 1-13. Digital Object Identifier: 10.1002/sim.4780050103 Google Scholar: Lookup Link · doi:10.1002/sim.4780050103
[63] Spirtes, P., Glymour, C. and Scheines, R. (2000). Causation, Prediction, and Search, 2nd ed. Adaptive Computation and Machine Learning. MIT Press, Cambridge, MA. MathSciNet: MR1815675
[64] SQUIRES, C., MAGLIACANE, S., GREENEWALD, K., KATZ, D., KOCAOGLU, M. and SHANMUGAM, K. (2020). Active structure learning of causal DAGs via directed clique trees. In Proceedings of the 34th International Conference on Neural Information Processing Systems. NIPS’20. Curran Associates Inc., Red Hook, NY, USA.
[65] STEFAN, A. M., SCHÖNBRODT, F. D., EVANS, N. J. and WAGENMAKERS, E. J. (2022). Efficiency in sequential testing: Comparing the sequential probability ratio test and the sequential Bayes factor test. Behav. Res. Methods 54 1554-3528.
[66] R CORE TEAM (2021). \(R\): A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria.
[67] TONG, S. and KOLLER, D. (2001). Active learning for structure in Bayesian networks. In Proceedings of the 17th International Joint Conference on Artificial Intelligence - Volume 2. IJCAI’01 863-869. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA.
[68] VERMA, T. and PEARL, J. (1990). Equivalence and synthesis of causal models. In Proceedings of the Sixth Annual Conference on Uncertainty in Artificial Intelligence. UAI 90 255-270. Elsevier Science Inc., New York, NY, USA.
[69] VON KÜGELGEN, J., RUBENSTEIN, P. K., SCHÖLKOPF, B. and WELLER, A. (2019). Optimal experimental design via Bayesian optimization: Active causal structure learning for Gaussian process networks. In NeurIPS 2019 Workshop do the Right Thing: Machine Learning and Causal Inference for Improved Decision Making.
[70] WANG, F. and GELFAND, A. E. (2002). A simulation-based approach to Bayesian sample size determination for performance under a given model and for separating models. Statist. Sci. 17 193-208. Digital Object Identifier: 10.1214/ss/1030550861 Google Scholar: Lookup Link MathSciNet: MR1925941 · Zbl 1013.62025 · doi:10.1214/ss/1030550861
[71] WEISS, R. (1997). Bayesian sample size calculations for hypothesis testing. J. R. Stat. Soc., Ser. D, Stat. 46 185-191.
[72] YANG, K., KATCOFF, A. and UHLER, C. (2018). Characterizing and learning equivalence classes of causal DAGs under interventions. In Proceedings of the 35th International Conference on Machine Learning (J. Dy and A. Krause, eds.). Proceedings of Machine Learning Research 80 5541-5550. PMLR.
[73] ZHANG, K., DUAN, X. and WU, J. (2016). Multigene disruption in undomesticated Bacillus subtilis ATCC 6051a using the CRISPR/Cas9 system. Sci. Rep. 6 27943.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.