×

Marginal integration for nonparametric causal inference. (English) Zbl 1330.62171

Summary: We consider the problem of inferring the total causal effect of a single continuous variable intervention on a (response) variable of interest. We propose a certain marginal integration regression technique for a very general class of potentially nonlinear structural equation models (SEMs) with known structure, or at least known superset of adjustment variables: we call the procedure S-mint regression. We easily derive that it achieves the convergence rate as for nonparametric regression: for example, single variable intervention effects can be estimated with convergence rate \(n^{-2/5}\) assuming smoothness with twice differentiable functions. Our result can also be seen as a major robustness property with respect to model misspecification which goes much beyond the notion of double robustness. Furthermore, when the structure of the SEM is not known, we can estimate (the equivalence class of) the directed acyclic graph corresponding to the SEM, and then proceed by using S-mint based on these estimates. We empirically compare the S-mint regression method with more classical approaches and argue that the former is indeed more robust, more reliable and substantially simpler.

MSC:

62G05 Nonparametric estimation
62H12 Estimation in multivariate analysis

References:

[1] Bang, H. and Robins, J. (2005). Doubly robust estimation in missing data and causal inference models., Biometrics , 61:962-972. · Zbl 1087.62121 · doi:10.1111/j.1541-0420.2005.00377.x
[2] Bollen, K. A. (1998)., Structural equation models . Wiley Online Library.
[3] Bühlmann, P. (2013). Causal statistical inference in high dimensions., Mathematical Methods of Operations Research , 77:357-370. · Zbl 1339.62001 · doi:10.1007/s00186-012-0404-7
[4] Bühlmann, P. and Hothorn, T. (2007). Boosting algorithms: regularization, prediction and model fitting (with discussion)., Statistical Science , 22:477-505. · Zbl 1246.62163 · doi:10.1214/07-STS242
[5] Bühlmann, P., Peters, J., and Ernest, J. (2014). CAM: Causal additive models, high-dimensional order search and penalized regression., Annals of Statistics , 42:2526-2556. · Zbl 1309.62063 · doi:10.1214/14-AOS1260
[6] Bühlmann, P. and Yu, B. (2003). Boosting with the \(L_2\) loss: regression and classification., Journal of the American Statistical Association , 98:324-339. · Zbl 1041.62029 · doi:10.1198/016214503000125
[7] Chickering, D. (2002). Optimal structure identification with greedy search., Journal of Machine Learning Research , 3:507-554. · Zbl 1084.68519 · doi:10.1162/153244303321897717
[8] Colombo, D., Maathuis, M., Kalisch, M., and Richardson, T. (2012). Learning high-dimensional directed acyclic graphs with latent and selection variables., Annals of Statistics , 40:294-321. · Zbl 1246.62131 · doi:10.1214/11-AOS940
[9] Dawid, A. P. (2000). Causal inference without counterfactuals., Journal of the American Statistical Association , 95:407-424. · Zbl 0999.62003 · doi:10.2307/2669377
[10] Editorial (2010). Cause and effect., Nature Methods , 7:243.
[11] Fan, J., Härdle, W., and Mammen, E. (1998). Direct estimation of low-dimensional components in additive models., Annals of Statistics , 26:943-971. · Zbl 1073.62527 · doi:10.1214/aos/1024691083
[12] Friedman, J. (2001). Greedy function approximation: a gradient boosting machine., Annals of Statistics , 29:1189-1232. · Zbl 1043.62034 · doi:10.1214/aos/1013203451
[13] Friedman, N. (2004). Inferring cellular networks using probabilistic graphical models., Science , 303:799-805.
[14] Greenland, S., Pearl, J., and Robins, J. M. (1999). Causal diagrams for epidemiologic research., Epidemiology , 10:37-48.
[15] Hall, P. and Marron, J. (1987). Estimation of integrated squared density derivatives., Statistics & Probability Letters , 6:109-115. · Zbl 0628.62029 · doi:10.1016/0167-7152(87)90083-6
[16] Hampel, F., Ronchetti, E., Rousseeuw, P., and Stahel, W. (2011)., Robust statistics: the approach based on influence functions . John Wiley & Sons. · Zbl 0593.62027
[17] Hauser, A. and Bühlmann, P. (2012). Characterization and greedy learning of interventional markov equivalence classes of directed acyclic graphs., The Journal of Machine Learning Research , 13:2409-2464. · Zbl 1433.68346
[18] Hauser, A. and Bühlmann, P. (2014). Two optimal strategies for active learning of causal models from interventional data., International Journal of Approximate Reasoning , 55:926-939. · Zbl 1390.68530 · doi:10.1016/j.ijar.2013.11.007
[19] Hauser, A. and Bühlmann, P. (2015). Jointly interventional and observational data: estimation of interventional markov equivalence classes of directed acyclic graphs., Journal of the Royal Statistical Society, Series B , 77:291-318. · doi:10.1111/rssb.12071
[20] He, Y.-B. and Geng., Z. (2008). Active learning of causal networks with intervention experiments and optimal designs., Journal of Machine Learning Research , 9:2523-2547. · Zbl 1225.68184
[21] Horowitz, J., Klemelä, J., and Mammen, E. (2006). Optimal estimation in additive regression models., Bernoulli , 12:271-298. · Zbl 1098.62043 · doi:10.3150/bj/1145993975
[22] Hoyer, P., Janzing, D., Mooij, J., Peters, J., and Schölkopf, B. (2009). Nonlinear causal discovery with additive noise models. In, Advances in Neural Information Processing Systems 21, 22nd Annual Conference on Neural Information Processing Systems (NIPS 2008) , pages 689-696. · Zbl 1318.68151
[23] Husmeier, D. (2003). Sensitivity and specificity of inferring genetic regulatory interactions from microarray experiments with dynamic bayesian networks., Bioinformatics , 19:2271-2282. · Zbl 1151.92011
[24] Imoto, S., Goto, T., and Miyano, S. (2002). Estimation of genetic networks and functional structures between genes by using Bayesian network and nonparametric regression. In, Proceedings of the Pacific Symposium on Biocomputing (PSB-2002) , volume 7, pages 175-186.
[25] Kalisch, M. and Bühlmann, P. (2007). Estimating high-dimensional directed acyclic graphs with the PC-algorithm., Journal of Machine Learning Research , 8:613-636. · Zbl 1222.68229
[26] Koller, D. and Friedman, N. (2009)., Probabilistic graphical models: principles and techniques . MIT press. · Zbl 1183.68483
[27] Lauritzen, S. and Spiegelhalter, D. (1988). Local computations with probabilities on graphical structures and their application to expert systems., Journal of the Royal Statistical Society, Series B , 50:157-224. · Zbl 0684.68106
[28] Li, L., Tchetgen, E. T., van der Vaart, A., and Robins, J. (2011). Higher order inference on a treatment effect under low regularity conditions., Statistics & Probability Letters , 81:821-828. · Zbl 1218.62021 · doi:10.1016/j.spl.2011.02.030
[29] Linton, O. and Nielsen, J. P. (1995). A kernel method of estimating structured nonparametric regression based on marginal integration., Biometrika , 82:93-100. · Zbl 0823.62036 · doi:10.1093/biomet/82.1.93
[30] Loh, P. and Bühlmann, P. (2014). High-dimensional learning of linear causal networks via inverse covariance estimation., Journal of Machine Learning Research , 15:3065-3105. · Zbl 1318.68148
[31] Maathuis, M., Colombo, D., Kalisch, M., and Bühlmann, P. (2010). Predicting causal effects in large-scale systems from observational data., Nature Methods , 7:247-248.
[32] Maathuis, M., Kalisch, M., and Bühlmann, P. (2009). Estimating high-dimensional intervention effects from observational data., Annals of Statistics , 37:3133-3164. · Zbl 1191.62118 · doi:10.1214/09-AOS685
[33] Marzio, M. D. and Taylor, C. (2008). On boosting kernel regression., Journal of Statistical Planning and Inference , 138:2483-2498. · Zbl 1182.62091 · doi:10.1016/j.jspi.2007.10.005
[34] Meinshausen, N. and Bühlmann, P. (2010). Stability Selection (with discussion)., Journal of the Royal Statistical Society, Series B , 72:417-473. · doi:10.1111/j.1467-9868.2010.00740.x
[35] Nowzohour, C. and Bühlmann, P. (2015). Score-based causal learning in additive noise models., Statistics . Published online, . · Zbl 1360.62135 · doi:10.1080/02331888.2015.1060237
[36] Pearl, J. (2000)., Causality: models, reasoning and inference . Cambridge Univ. Press. · Zbl 0959.68116
[37] Peters, J. and Bühlmann, P. (2014). Identifiability of Gaussian structural equation models with equal error variances., Biometrika , 101:219-228. · Zbl 1285.62005 · doi:10.1093/biomet/ast043
[38] Peters, J., Mooij, J., Janzing, D., and Schölkopf, B. (2014). Causal discovery with continuous additive noise models., Journal of Machine Learning Research , 15:2009-2053. · Zbl 1318.68151
[39] Polzehl, J. and Spokoiny, V. (2000). Adaptive weights smoothing with applications to image restoration., Journal of the Royal Statistical Society, Series B , 62:335-354. · doi:10.1111/1467-9868.00235
[40] Robins, J., Rotnitzky, A., and Zhao, L. (1994). Estimation of regression coefficients when some of the regressors are not always observed., Journal of the American Statistical Association , 89:846-866. · Zbl 0815.62043 · doi:10.2307/2290910
[41] Robins, J., Tchetgen, E. T., Li, L., and van der Vaart, A. (2009). Semiparametric minimax rates., Electronic Journal of Statistics , 3:1305-1321. · Zbl 1326.62080 · doi:10.1214/09-EJS479
[42] Rosenbaum, P. and Rubin, D. (1983). The central role of the propensity score in observational studies for causal effects., Biometrika , 70:41-55. · Zbl 0522.62091 · doi:10.1093/biomet/70.1.41
[43] Rubin, D. B. (2005). Causal inference using potential outcomes., Journal of the American Statistical Association , 100:322-331. · Zbl 1117.62418 · doi:10.1198/016214504000001880
[44] Scharfstein, D., Rotnitzky, A., and Robins, J. (1999). Adjusting for nonignorable drop-out using semiparametric nonresponse models (with discussion)., Journal of the American Statistical Association , 94:1096-1146. · Zbl 1072.62644 · doi:10.2307/2669923
[45] Schmidt, M., Niculescu-Mizil, A., and Murphy, K. (2007). Learning graphical model structure using l1-regularization paths. In, Proceedings of the National Conference on Artificial Intelligence , volume 22, page 1278. Menlo Park, CA; Cambridge, MA; London; AAAI Press; MIT Press; 1999.
[46] Shimizu, S., Hoyer, P., Hyvärinen, A., and Kerminen, A. (2006). A linear non-Gaussian acyclic model for causal discovery., Journal of Machine Learning Research , 7:2003-2030. · Zbl 1222.68304
[47] Shojaie, A. and Michailidis, G. (2010). Penalized likelihood methods for estimation of sparse high-dimensional directed acyclic graphs., Biometrika , 97:519-538. · Zbl 1195.62090 · doi:10.1093/biomet/asq038
[48] Shpitser, I., Richardson, T. S., and Robins, J. M. (2011). An efficient algorithm for computing interventional distributions in latent variable causal models. In, Proceedings of the 27th Conference on Uncertainty in Artificial Intelligence (UAI) , pages 661-670.
[49] Smith, V. A., Jarvis, E. D., and Hartemink, A. J. (2002). Evaluating functional network inference using simulations of complex biological systems., Bioinformatics , 18(suppl 1):S216-S224. · Zbl 1177.62126 · doi:10.1080/03610920902898472
[50] Song, L., Fukumizu, K., and Gretton, A. (2013). Kernel embeddings of conditional distributions: A unified kernel framework for nonparametric inference in graphical models., Signal Processing Magazine, IEEE , 30:98-111.
[51] Spirtes, P. (2010). Introduction to causal inference., The Journal of Machine Learning Research , 11:1643-1662. · Zbl 1242.62009
[52] Spirtes, P., Glymour, C., and Scheines, R. (2000)., Causation, Prediction, and Search . MIT Press, second edition. · Zbl 0806.62001
[53] Stekhoven, D., Moraes, I., Sveinbjörnsson, G., Hennig, L., Maathuis, M., and Bühlmann, P. (2012). Causal stability ranking., Bioinformatics , 28:2819-2823.
[54] Teyssier, M. and Koller, D. (2005). Ordering-based search: a simple and effective algorithm for learning Bayesian networks. In, Proceedings of the 21th Conference on Uncertainty in Artificial Intelligence (UAI) , pages 584-590, Edinburgh, Scottland, UK.
[55] van de Geer, S. (2014). On the uniform convergence of empirical norms and inner products, with application to causal inference., Electronic Journal of Statistics , 8:543-574. · Zbl 1348.62152 · doi:10.1214/14-EJS894
[56] van de Geer, S. and Bühlmann, P. (2013). \(\ell_0\)-penalized maximum likelihood for sparse directed acyclic graphs., Annals of Statistics , 41:536-567. · Zbl 1267.62037 · doi:10.1214/13-AOS1085
[57] van der Laan, M. J. and Robins, J. M. (2003)., Unified methods for censored longitudinal data and causality . Springer. · Zbl 1013.62034
[58] van der Laan, M. J. and Rose, S. (2011)., Targeted Learning. Causal Inference for Observational and Experimental Data . Springer, New York. · Zbl 1218.62121 · doi:10.1016/j.spl.2010.11.001
[59] Wille, A., Zimmermann, P., Vranová, E., Fürholz, A., Laule, O., Bleuler, S., Hennig, L., Prelic, A., von Rohr, P., Thiele, L., et al. (2004). Sparse graphical gaussian modeling of the isoprenoid gene network in arabidopsis thaliana., Genome Biol , 5(11):R92.
[60] Wood, S. (2006)., Generalized Additive Models: An Introduction with R . Chapman and Hall/CRC. · Zbl 1087.62082
[61] Wood, S. N. (2003). Thin-plate regression splines., Journal of the Royal Statistical Society (B) , 65:95-114. · Zbl 1063.62059 · doi:10.1111/1467-9868.00374
[62] Yu, J., Smith, V. A., Wang, P. P., Hartemink, A. J., and Jarvis, E. D. (2004). Advances to bayesian network inference for generating causal networks from observational biological data., Bioinformatics , 20:3594-3603.
[63] Zhang, J. (2008). On the completeness of orientation rules for causal discovery in the presence of latent confounders and selection bias., Artificial Intelligence , 172:1873-1896. · Zbl 1184.68434 · doi:10.1016/j.artint.2008.08.001
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.