×

Augmented direct learning for conditional average treatment effect estimation with double robustness. (English) Zbl 07556938

Summary: Inferring the heterogeneous treatment effect is a fundamental problem in many applications. In this paper, we focus on estimating the Conditional Average Treatment Effect (CATE), that is, the difference in the conditional mean outcome between treatments given covariates. Traditionally, Q-Learning based approaches estimate each conditional mean outcome. However, they are subject to model misspecification. Recently, flexible one-step methods to directly learn (D-Learning) the CATE without outcome model specifications have been proposed. However, they require a specification of the propensity score. We propose robust direct learning (RD-Learning), to augment D-learning, leading to doubly robust estimators of the treatment effect. The consistency for our CATE estimator is guaranteed if either the main effect model or the propensity score model is correctly specified. The framework can be used in both the binary and the multi-arm settings and is general enough to allow different function spaces and incorporate different generic learning algorithms. We conduct a thorough theoretical analysis of the prediction error of our CATE estimator using statistical learning theory under both linear and non-linear settings. The effectiveness of our proposed method is demonstrated by simulation studies and a real data example about an AIDS Clinical Trials study.

MSC:

62-XX Statistics

References:

[1] ATHEY, S. and IMBENS, G. (2016). Recursive partitioning for heterogeneous causal effects. Proceedings of the National Academy of Sciences 113 7353-7360. · Zbl 1357.62190
[2] BANG, H. and ROBINS, J. M. (2005). Doubly robust estimation in missing data and causal inference models. Biometrics 61 962-973. · Zbl 1087.62121
[3] BEYGELZIMER, A. and LANGFORD, J. (2009). The offset tree for learning with partial labels. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining 129-138.
[4] BONETTI, M. and GELBER, R. D. (2004). Patterns of treatment effects in subsets of patients in clinical trials. Biostatistics 5 465-481. · Zbl 1154.62384
[5] BOTTOU, L., PETERS, J., QUIÑONERO-CANDELA, J., CHARLES, D. X., CHICKERING, D. M., PORTUGALY, E., RAY, D., SIMARD, P. and SNELSON, E. (2013). Counterfactual reasoning and learning systems: The example of computational advertising. The Journal of Machine Learning Research 14 3207-3260. · Zbl 1318.62206
[6] CAO, W., TSIATIS, A. A. and DAVIDIAN, M. (2009). Improving efficiency and robustness of the doubly robust estimator for a population mean with incomplete data. Biometrika 96 723-734. · Zbl 1170.62007
[7] CHATTERJEE, S. (2013). Assumptionless consistency of the lasso. arXiv preprint arXiv: 1303.5817.
[8] Chen, S., Tian, L., Cai, T. and Yu, M. (2017). A general statistical framework for subgroup identification and comparative treatment scoring. Biometrics 73 1199-1209. · Zbl 1405.62164 · doi:10.1111/biom.12676
[9] CHIPMAN, H. A., GEORGE, E. I. and MCCULLOCH, R. E. (2010). BART: Bayesian additive regression trees. The Annals of Applied Statistics 4 266-298. · Zbl 1189.62066
[10] DALALYAN, A. S., HEBIRI, M., LEDERER, J. et al. (2017). On the prediction performance of the lasso. Bernoulli 23 552-581. · Zbl 1359.62295
[11] DUDÍK, M., LANGFORD, J. and LI, L. (2011). Doubly robust policy evaluation and learning. arXiv preprint arXiv: 1103.4601. · Zbl 1331.62059
[12] FAN, C., LU, W., SONG, R. and ZHOU, Y. (2017). Concordance-assisted learning for estimating optimal individualized treatment regimes. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 79 1565-1582. · Zbl 1381.62097
[13] FAN, J., IMAI, K., LIU, H., NING, Y. and YANG, X. (2016). Improving covariate balancing propensity score: A doubly robust and efficient approach Technical Report, Technical report, Princeton Univ.
[14] HAHN, P. R., MURRAY, J. S. and CARVALHO, C. M. (2020). Bayesian regression tree models for causal inference: regularization, confounding, and heterogeneous effects. Bayesian Analysis. · Zbl 1475.62102
[15] HAMMER, S. M., KATZENSTEIN, D. A., HUGHES, M. D., GUNDACKER, H., SCHOOLEY, R. T., HAUBRICH, R. H., HENRY, W. K., LEDERMAN, M. M. et al. (1996). A trial comparing nucleoside monotherapy with combination therapy in HIV-infected adults with CD4 cell counts from 200 to 500 per cubic millimeter. New England Journal of Medicine 335 1081-1090.
[16] HILL, J. L. (2011). Bayesian nonparametric modeling for causal inference. Journal of Computational and Graphical Statistics 20 217-240.
[17] HOFMANN, T., SCHÖLKOPF, B. and SMOLA, A. J. (2008). Kernel methods in machine learning. The annals of statistics 1171-1220. · Zbl 1151.30007
[18] IMBENS, G. W. and RUBIN, D. B. (2015). Causal inference in statistics, social, and biomedical sciences. Cambridge University Press. · Zbl 1355.62002
[19] JOHANSSON, F., SHALIT, U. and SONTAG, D. (2016). Learning representations for counterfactual inference. In International conference on machine learning 3020-3029.
[20] KANG, J. D. and SCHAFER, J. L. (2007). Demystifying double robustness: A comparison of alternative strategies for estimating a population mean from incomplete data. Statistical science 22 523-539. · Zbl 1246.62073
[21] KNAUS, M. C., LECHNER, M. and STRITTMATTER, A. (2020). Machine learning estimation of heterogeneous causal effects: Empirical Monte Carlo evidence. The Econometrics Journal. utaa014. · Zbl 07546390 · doi:10.1093/ectj/utaa014
[22] Kosorok, M. R. and Laber, E. B. (2019). Precision medicine. Annual review of statistics and its application 6 263-286.
[23] KÜNZEL, S. R., SEKHON, J. S., BICKEL, P. J. and YU, B. (2019). Metalearners for estimating heterogeneous treatment effects using machine learning. Proceedings of the national academy of sciences 116 4156-4165.
[24] MOODIE, E. E., DEAN, N. and SUN, Y. R. (2014). Q-learning: Flexible learning about useful utilities. Statistics in Biosciences 6 223-243.
[25] Murphy, S. A. (2003). Optimal dynamic treatment regimes. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 65 331-355. · Zbl 1065.62006
[26] MURPHY, S. A., VAN DER LAAN, M. J., ROBINS, J. M. and GROUP, C. P. P. R. (2001). Marginal mean models for dynamic regimes. Journal of the American Statistical Association 96 1410-1423. · Zbl 1051.62114
[27] NIE, X. and WAGER, S. (2017). Quasi-oracle estimation of heterogeneous treatment effects. arXiv preprint arXiv: 1712.04912. · Zbl 07458256
[28] POWERS, S., QIAN, J., JUNG, K., SCHULER, A., SHAH, N. H., HASTIE, T. and TIBSHIRANI, R. (2018). Some methods for heterogeneous treatment effect estimation in high dimensions. Statistics in medicine 37 1767-1787.
[29] QI, Z., LIU, D., FU, H. and LIU, Y. (2019). Multi-Armed Angle-Based Direct Learning for Estimating Optimal Individualized Treatment Rules With Various Outcomes. Journal of the American Statistical Association 1-33. · Zbl 1445.62286
[30] QI, Z. and LIU, Y. (2018). D-learning to estimate optimal individual treatment rules. Electronic Journal of Statistics 12 3601-3638. · Zbl 1454.62381
[31] QIAN, M. and MURPHY, S. A. (2011). Performance guarantees for individualized treatment rules. Annals of statistics 39 1180. · Zbl 1216.62178
[32] ROBINS, J. M. (2004). Optimal structural nested models for optimal sequential decisions. In Proceedings of the second seattle Symposium in Biostatistics 189-326. Springer. · Zbl 1279.62024
[33] ROBINS, J. M., ROTNITZKY, A. and ZHAO, L. P. (1994). Estimation of regression coefficients when some regressors are not always observed. Journal of the American statistical Association 89 846-866. · Zbl 0815.62043
[34] ROBINSON, P. M. (1988). Root-N-consistent semiparametric regression. Econometrica: Journal of the Econometric Society 931-954. · Zbl 0647.62100
[35] Rosenbaum, P. R. and Rubin, D. B. (1983). The central role of the propensity score in observational studies for causal effects. Biometrika 70 41-55. · Zbl 0522.62091 · doi:10.1093/biomet/70.1.41
[36] ROYSTON, P. and SAUERBREI, W. (2008). Interactions between treatment and continuous covariates: a step toward individualizing therapy. · Zbl 1269.62053
[37] RUBIN, D. B. (1974). Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of educational Psychology 66 688.
[38] SCHOLKOPF, B. and SMOLA, A. J. (2001). Learning with kernels: support vector machines, regularization, optimization, and beyond. MIT press.
[39] SCHULZ, J. and MOODIE, E. E. (2021). Doubly robust estimation of optimal dosing strategies. Journal of the American Statistical Association 116 256-268. · Zbl 1457.62372
[40] SHI, C., SONG, R. and LU, W. (2016). Robust learning for optimal treatment decision with NP-dimensionality. Electronic journal of statistics 10 2894. · Zbl 1419.62445
[41] SIGNOROVITCH, J. E. (2007). Identifying informative biological markers in high-dimensional genomic data and clinical trials, PhD thesis, Harvard University.
[42] STEINWART, I. and SCOVEL, C. (2007). Fast rates for support vector machines using Gaussian kernels. The Annals of Statistics 35 575-607. · Zbl 1127.68091
[43] SU, X., TSAI, C.-L., WANG, H., NICKERSON, D. M. and LI, B. (2009). Subgroup analysis via recursive partitioning. Journal of Machine Learning Research 10.
[44] TADDY, M., GARDNER, M., CHEN, L. and DRAPER, D. (2016). A nonparametric bayesian analysis of heterogenous treatment effects in digital experimentation. Journal of Business & Economic Statistics 34 661-672.
[45] Tian, L., Alizadeh, A. A., Gentles, A. J. and Tibshirani, R. (2014). A simple method for estimating interactions between a treatment and a large number of covariates. Journal of the American Statistical Association 109 1517-1532. · Zbl 1368.62294
[46] TREVOR, H., ROBERT, T. and JH, F. (2009). The elements of statistical learning: data mining, inference, and prediction. · Zbl 1273.62005
[47] TURNEY, K. and WILDEMAN, C. (2015). Detrimental for some? Heterogeneous effects of maternal incarceration on child wellbeing. Criminology & Public Policy 14 125-156.
[48] VAART, A. W. and WELLNER, J. A. (1996). Weak convergence and empirical processes: with applications to statistics. Springer. · Zbl 0862.60002
[49] WAGER, S. and ATHEY, S. (2018). Estimation and inference of heterogeneous treatment effects using random forests. Journal of the American Statistical Association 113 1228-1242. · Zbl 1402.62056
[50] WAHBA, G. (1990). Spline models for observational data 59. Siam.
[51] WALLACE, M. P. and MOODIE, E. E. (2015). Doubly-robust dynamic treatment regimen estimation via weighted least squares. Biometrics 71 636-644. · Zbl 1419.62467
[52] WANG, B. and ZOU, H. (2018). Another look at distance-weighted discrimination. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 80 177-198. · Zbl 1380.62028
[53] WANG, L. and SHEN, X. (2007). On L1-norm multiclass support vector machines: methodology and theory. Journal of the American Statistical Association 102 583-594. · Zbl 1172.62317
[54] WATKINS, C. J. and DAYAN, P. (1992). Q-learning. Machine learning 8 279-292. · Zbl 0773.68062
[55] WEISBERG, H. I. and PONTES, V. P. (2015). Post hoc subgroups in clinical trials: Anathema or analytics? Clinical trials 12 357-364.
[56] ZHANG, B., TSIATIS, A. A., LABER, E. B. and DAVIDIAN, M. (2012). A robust method for estimating optimal treatment regimes. Biometrics 68 1010-1018. · Zbl 1258.62116
[57] ZHANG, B., TSIATIS, A. A., LABER, E. B. and DAVIDIAN, M. (2013). Robust estimation of optimal dynamic treatment regimes for sequential treatment decisions. Biometrika 100 681-694. · Zbl 1284.62508
[58] ZHANG, C., CHEN, J., FU, H., HE, X., ZHAO, Y. and LIU, Y. (2018). Multicategory Outcome Weighted Margin-based Learning for Estimating Individualized Treatment Rules. Statistica Sinica. · Zbl 1464.62481
[59] ZHANG, C. and LIU, Y. (2014). Multicategory angle-based large-margin classification. Biometrika 101 625-640. · Zbl 1335.62110
[60] ZHANG, C., LIU, Y. and WU, Y. (2016). On quantile regression in reproducing kernel Hilbert spaces with the data sparsity constraint. The Journal of Machine Learning Research 17 1374-1418. · Zbl 1360.62201
[61] Zhao, Y., Zeng, D., Rush, A. J. and Kosorok, M. R. (2012). Estimating individualized treatment rules using outcome weighted learning. Journal of the American Statistical Association 107 1106-1118. · Zbl 1443.62396
[62] ZHAO, Y.-Q., LABER, E. B., NING, Y., SAHA, S. and SANDS, B. E. (2019). Efficient augmentation and relaxation learning for individualized treatment rules using observational data. Journal of Machine Learning Research 20 1-23. · Zbl 1484.62130
[63] ZHAO, Y.-Q., ZENG, D., LABER, E. B., SONG, R., YUAN, M. and KOSOROK, M. R. (2014). Doubly robust learning for estimating individualized treatment with censored data. Biometrika 102 151-168. · Zbl 1345.62092
[64] Zhou, X., Mayer-Hamblett, N., Khan, U. and Kosorok, M. R. (2017). Residual weighted learning for estimating individualized treatment rules. Journal of the American Statistical Association 112 169-187.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.