×

Bayesian regression tree models for causal inference: regularization, confounding, and heterogeneous effects (with discussion). (English) Zbl 1475.62102

Summary: This paper presents a novel nonlinear regression model for estimating heterogeneous treatment effects, geared specifically towards situations with small effect sizes, heterogeneous effects, and strong confounding by observables. Standard nonlinear regression models, which may work quite well for prediction, have two notable weaknesses when used to estimate heterogeneous treatment effects. First, they can yield badly biased estimates of treatment effects when fit to data with strong confounding. The Bayesian causal forest model presented in this paper avoids this problem by directly incorporating an estimate of the propensity function in the specification of the response model, implicitly inducing a covariate-dependent prior on the regression function. Second, standard approaches to response surface modeling do not provide adequate control over the strength of regularization over effect heterogeneity. The Bayesian causal forest model permits treatment effect heterogeneity to be regularized separately from the prognostic effect of control variables, making it possible to informatively “shrink to homogeneity”. While we focus on observational data, our methods are equally useful for inferring heterogeneous treatment effects from randomized controlled experiments where careful regularization is somewhat less complicated but no less important. We illustrate these benefits via the reanalysis of an observational study assessing the causal effects of smoking on medical expenditures as well as extensive simulation studies.

MSC:

62D20 Causal inference from observational studies
62F15 Bayesian inference
62J02 General nonlinear regression
62J07 Ridge regression; shrinkage estimators (Lasso)
62P10 Applications of statistics to biology and medical sciences; meta analysis

References:

[1] Athey, S., Tibshirani, J., Wager, S., et al. (2019). “Generalized random forests.” The Annals of Statistics, 47(2): 1148-1178. · Zbl 1418.62102 · doi:10.1214/18-AOS1709
[2] Bang, H. and Robins, J. M. (2005). “Doubly robust estimation in missing data and causal inference models.” Biometrics, 61(4): 962-973. · Zbl 1087.62121 · doi:10.1111/j.1541-0420.2005.00377.x
[3] Breiman, L. (2001). “Random forests.” Machine learning, 45(1): 5-32. · Zbl 1007.68152 · doi:10.1023/A:1010933404324
[4] Carvalho, C. M., Polson, N. G, and Scott, J. G. (2010). “The horseshoe estimator for sparse signals.” Biometrika, 97(2): 465-480. Oxford University Press. · Zbl 1406.62021 · doi:10.1093/biomet/asq017
[5] Chernozhukov, V., Chetverikov, D., Demirer, M., Duflo, E., Hansen, C., et al. (2016). “Double machine learning for treatment and causal parameters.” arXiv preprint arXiv:1608.00060.
[6] Chipman, H., George, E., and McCulloch, R. (1998). “Bayesian CART model search.” Journal of the American Statistical Association, 93(443): 935-948.
[7] Chipman, H. A., George, E. I., and McCulloch, R. E. (2010). “BART: Bayesian additive regression trees.” The Annals of Applied Statistics, 266-298. · Zbl 1189.62066 · doi:10.1214/09-AOAS285
[8] Dorie, V. and Hill, J. (2017). aciccomp2016: Atlantic Causal Inference Conference Competition 2016 Simulation. R package version 0.1-0.
[9] Dorie, V., Hill, J., Shalit, U., Scott, M., Cervone, D., et al. (2019). “Automated versus do-it-yourself methods for causal inference: Lessons learned from a data analysis competition.” Statistical Science, 34(1): 43-68. · Zbl 1420.62345 · doi:10.1214/18-STS667
[10] Efron, B. (2014). “Estimation and accuracy after model selection.” Journal of the American Statistical Association, 109(507): 991-1007. · Zbl 1368.62071 · doi:10.1080/01621459.2013.823775
[11] Friedberg, R., Tibshirani, J., Athey, S., and Wager, S. (2018). “Local linear forests.” arXiv preprint arXiv:1807.11408.
[12] Ganin, Y., Ustinova, E., Ajakan, H., Germain, P., Larochelle, H., Laviolette, F., Marchand, M., and Lempitsky, V. (2016). “Domain-adversarial training of neural networks.” The Journal of Machine Learning Research, 17(1): 2096-2030. · Zbl 1360.68671
[13] Gelman, A. et al. (2006). “Prior distributions for variance parameters in hierarchical models (comment on article by Browne and Draper).” Bayesian Analysis, 1(3): 515-534. · Zbl 1331.62139 · doi:10.1214/06-BA117A
[14] Giles, D. and Rayner, A. (1979). “The mean squared errors of the maximum likelihood and natural-conjugate Bayes regression estimators.” Journal of Econometrics, 11(2): 319-334. · Zbl 0421.62046 · doi:10.1016/0304-4076(79)90043-5
[15] Gramacy, R. B. and Lee, H. K. (2008). “Bayesian treed Gaussian process models with an application to computer modeling.” Journal of the American Statistical Association, 103(483). · Zbl 1205.62218 · doi:10.1198/016214508000000689
[16] Green, D. P. and Kern, H. L. (2012). “Modeling heterogeneous treatment effects in survey experiments with Bayesian additive regression trees.” Public opinion quarterly, nfs036.
[17] Gustafson, P. and Greenland, S. (2006). “Curious phenomena in Bayesian adjustment for exposure misclassification.” Statistics in Medicine, 25(1): 87-103.
[18] Hahn, P. R. and Carvalho, C. M. (2015). “Decoupling shrinkage and selection in Bayesian linear models: a posterior summary perspective.” Journal of the American Statistical Association, 110(509): 435-448. · Zbl 1373.62036 · doi:10.1080/01621459.2014.993077
[19] Hahn, P. R., Dorie, V., and Murray, J. S. (2018). Atlantic Causal Inference Conference (ACIC) Data Analysis Challenge 2017.
[20] Hahn, P. R., Puelz, D., He, J., and Carvalho, C. M. (2016). “Regularization and confounding in linear regression for treatment effect estimation.” Bayesian Analysis. · Zbl 06873722 · doi:10.1214/16-BA1044
[21] Hansen, B. B. (2008). “The prognostic analogue of the propensity score.” Biometrika, 95(2): 481-488. · Zbl 1437.62485 · doi:10.1093/biomet/asn004
[22] He, J. (2019). “Stochastic tree ensembles for regularized supervised learning.” Technical report, University of Chicago Booth School of Business.
[23] Heckman, J. J., Lopes, H. F., and Piatek, R. (2014). “Treatment effects: A Bayesian perspective.” Econometric reviews, 33(1-4): 36-67. · Zbl 1491.62218
[24] Hill, J., Su, Y.-S., et al. (2013). “Assessing lack of common support in causal inference using Bayesian nonparametrics: Implications for evaluating the effect of breastfeeding on children’s cognitive outcomes.” The Annals of Applied Statistics, 7(3): 1386-1420. · Zbl 1283.62220 · doi:10.1214/13-AOAS630
[25] Hill, J. L. (2011). “Bayesian nonparametric modeling for causal inference.” Journal of Computational and Graphical Statistics, 20(1).
[26] Imai, K., Ratkovic, M., et al. (2013). “Estimating treatment effect heterogeneity in randomized program evaluation.” The Annals of Applied Statistics, 7(1): 443-470. · Zbl 1376.62036 · doi:10.1214/12-AOAS593
[27] Imai, K. and Van Dyk, D. A. (2004). “Causal inference with general treatment regimes: Generalizing the propensity score.” Journal of the American Statistical Association, 99(467): 854-866. · Zbl 1117.62361 · doi:10.1198/016214504000001187
[28] Imbens, G. W. and Rubin, D. B. (2015). Causal Inference in Statistics, Social, and Biomedical Sciences. Cambridge University Press. · Zbl 1355.62002
[29] Johnson, E., Dominici, F., Griswold, M., and Zeger, S. L. (2003). “Disease cases and their medical costs attributable to smoking: an analysis of the national medical expenditure survey.” Journal of Econometrics, 112(1): 135-151. · Zbl 1038.62101 · doi:10.1016/S0304-4076(02)00157-4
[30] Kern, H. L., Stuart, E. A., Hill, J., and Green, D. P. (2016). “Assessing methods for generalizing experimental impact estimates to target populations.” Journal of Research on Educational Effectiveness, 9(1): 103-127.
[31] Künzel, S. R., Sekhon, J. S., Bickel, P. J., and Yu, B. (2019). “Metalearners for estimating heterogeneous treatment effects using machine learning.” Proceedings of the National Academy of Sciences, 116(10): 4156-4165.
[32] Li, M. and Tobias, J. L. (2014). “Bayesian analysis of treatment effect models.” In Jeliazkov, I. and Yang, X.-S. (eds.), Bayesian Inference in the Social Sciences, chapter 3, 63-90. Wiley.
[33] Linero, A. R. and Yang, Y. (2018). “Bayesian regression tree ensembles that adapt to smoothness and sparsity.” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 80(5): 1087-1110. · Zbl 1407.62138 · doi:10.1111/rssb.12293
[34] McCaffrey, D. F., Griffin, B. A., Almirall, D., Slaughter, M. E., Ramchand, R., and Burgette, L. F. (2013). “A tutorial on propensity score estimation for multiple treatments using generalized boosted models.” Statistics in Medicine, 32(19): 3388-3414.
[35] McCaffrey, D. F., Ridgeway, G., and Morral, A. R. (2004). “Propensity score estimation with boosted regression for evaluating causal effects in observational studies.” Psychological Methods, 9(4): 403.
[36] McCandless, L. C., Gustafson, P., and Austin, P. C. (2009). “Bayesian propensity score analysis for observational data.” Statistics in Medicine, 28(1): 94-112.
[37] McConnell, K. J. and Lindner, S. (2019). “Estimating treatment effects with machine learning.” Health services research.
[38] Murray, J. S. (2017). “Log-Linear Bayesian Additive Regression Trees for Multinomial Logistic and Count Regression Models.” arXiv preprint arXiv:1701.01503.
[39] Nie, X. and Wager, S. (2017). “Quasi-oracle estimation of heterogeneous treatment effects.” arXiv preprint arXiv:1712.04912.
[40] Polson, N. G., Scott, J. G., et al. (2012). “On the half-Cauchy prior for a global scale parameter.” Bayesian Analysis, 7(4): 887-902. · Zbl 1330.62148 · doi:10.1214/12-BA730
[41] Powers, S., Qian, J., Jung, K., Schuler, A., Shah, N. H., Hastie, T., and Tibshirani, R. (2018). “Some methods for heterogeneous treatment effect estimation in high dimensions.” Statistics in medicine, 37(11): 1767-1787.
[42] Robins, J. M., Mark, S. D., and Newey, W. K. (1992). “Estimating exposure effects by modelling the expectation of exposure conditional on confounders.” Biometrics, 479-495. · Zbl 0768.62099 · doi:10.2307/2532304
[43] Robins, J. M. and Ritov, Y. (1997). “Toward a curse of dimensionality appropriate (CODA) asymptotic theory for semi-parametric models.” Statistics in medicine, 16(3): 285-319.
[44] Rocková, V. and Saha, E. (2019). “On Theory for BART.” In The 22nd International Conference on Artificial Intelligence and Statistics, 2839-2848.
[45] Rocková, V. and van der Pas, S. (2017). “Posterior concentration for Bayesian regression trees and forests.” Annals of Statistics (In Revision), 1-40. · Zbl 1459.62057
[46] Rosenbaum, P. R. and Rubin, D. B. (1983). “The central role of the propensity score in observational studies for causal effects.” Biometrika, 41-55. · Zbl 0522.62091 · doi:10.1093/biomet/70.1.41
[47] Roy, J., Lum, K. J., Zeldow, B., Dworkin, J. D., Re III, V. L., and Daniels, M. J. (2017). “Bayesian nonparametric generative models for causal inference with missing at random covariates.” Biometrics.
[48] Shalit, U., Johansson, F. D., and Sontag, D. (2017). “Estimating individual treatment effect: generalization bounds and algorithms.” In Proceedings of the 34th International Conference on Machine Learning-Volume 70, 3076-3085. JMLR.org.
[49] Sivaganesan, S., Müller, P., and Huang, B. (2017). “Subgroup finding via Bayesian additive regression trees.” Statistics in Medicine.
[50] Su, X., Kang, J., Fan, J., Levine, R. A., and Yan, X. (2012). “Facilitating score and causal inference trees for large observational studies.” Journal of Machine Learning Research, 13(Oct): 2955-2994. · Zbl 1433.68382
[51] Taddy, M., Gardner, M., Chen, L., and Draper, D. (2016). “A Nonparametric Bayesian Analysis of Heterogenous Treatment Effects in Digital Experimentation.” Journal of Business & Economic Statistics, 34(4): 661-672.
[52] van der Laan, M. J. (2010a). “Targeted maximum likelihood based causal inference: Part I.” The International Journal of Biostatistics, 6(2).
[53] van der Laan, M. J. (2010b). “Targeted maximum likelihood based causal inference: Part II.” The International Journal of Biostatistics, 6(2).
[54] Wager, S. and Athey, S. (2018). “Estimation and inference of heterogeneous treatment effects using random forests.” Journal of the American Statistical Association, 113(523): 1228-1242. · Zbl 1402.62056 · doi:10.1080/01621459.2017.1319839
[55] Wager, S., Hastie, T., and Efron, B. (2014). “Confidence intervals for random forests: The jackknife and the infinitesimal jackknife.” The Journal of Machine Learning Research, 15(1): 1625-1651. · Zbl 1319.62132
[56] Wang, C., Parmigiani, G., and Dominici, F. (2012). “Bayesian effect estimation accounting for adjustment uncertainty.” Biometrics, 68(3): 661-671. · Zbl 1274.62895 · doi:10.1111/j.1541-0420.2011.01731.x
[57] Wendling, T., Jung, K., Callahan, A., Schuler, A., Shah, N., and Gallego, B. (2018). “Comparing methods for estimation of heterogeneous treatment effects using observational data from health care databases.” Statistics in medicine, 37(23): 3309-3324.
[58] Yang, Y., Cheng, G., and Dunson, D. B. (2015). “Semiparametric Bernstein-von Mises Theorem: Second Order Studies.” arXiv preprint arXiv:1503.04493.
[59] Yeager, D. S., Hanselman, P., Walton, G. M., Murray, J. S., Crosnoe, R., Muller, C., Tipton, E., Schneider, B., Hulleman, C. S., Hinojosa, C. P., Paunesku, D., Romero, C., Flint, K., Roberts, A., Trott, J., Iachan, R., Buontempo, J., Yang, S. M., Carvalho, C. M., Hahn, P. R., Gopalan, M., Mhatre, P., Ferguson, R., Duckworth, A. L., and Dweck, C. S. (2019). “A national experiment reveals where a growth mindset improves achievement.” Nature, 573(7774): 364-369. URL https://doi.org/10.1038/s41586-019-1466-y.
[60] Zaidi, A. and Mukherjee, S. (2018). “Gaussian Process Mixtures for Estimating Heterogeneous Treatment Effects.” arXiv preprint arXiv:1812.07153.
[61] Zeger, S. L., Wyant, T., Miller, L. S., and Samet, J. (2000). “Statistical testimony on damages in Minnesota v. Tobacco Industry.” In Statistical Science in the Courtroom, 303-320. Springer.
[62] Zellner, A. (1986). “On assessing prior distributions and Bayesian regression analysis with g-prior distributions.” Bayesian inference and decision techniques: Essays in Honor of Bruno De Finetti, 6: 233-243. · Zbl 0655.62071
[63] Zigler, C.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.