×

Data-driven algorithms for dimension reduction in causal inference. (English) Zbl 1466.62176

Summary: In observational studies, the causal effect of a treatment may be confounded with variables that are related to both the treatment and the outcome of interest. In order to identify a causal effect, such studies often rely on the unconfoundedness assumption, i.e., that all confounding variables are observed. The choice of covariates to control for, which is primarily based on subject matter knowledge, may result in a large covariate vector in the attempt to ensure that unconfoundedness holds. However, including redundant covariates can affect bias and efficiency of nonparametric causal effect estimators, e.g., due to the curse of dimensionality. Data-driven algorithms for the selection of sufficient covariate subsets are investigated. Under the assumption of unconfoundedness the algorithms search for minimal subsets of the covariate vector. Based, e.g., on the framework of sufficient dimension reduction or kernel smoothing, the algorithms perform a backward elimination procedure assessing the significance of each covariate. Their performance is evaluated in simulations and an application using data from the Swedish Childhood Diabetes Register is also presented.

MSC:

62-08 Computational methods for problems pertaining to statistics
62P10 Applications of statistics to biology and medical sciences; meta analysis

Software:

CovSel; glmnet; R; MASS (R)

References:

[1] Abadie, A.; Imbens, G. W., Large sample properties of matching estimators for average treatment effects, Econometrica, 74, 235-267, (2006) · Zbl 1112.62042
[2] Berhan, Y.; Waernbaum, I.; Lind, T.; Möllsten, A.; Dahlquist, G., Thirty years of prospective nationwide incidence of childhood type 1 diabetes: the accelerating increase by time tends to level off in Sweden, Diabetes, 60, 577-581, (2011)
[3] Brookhart, M. A.; Schneeweiss, S.; Rothman, K. J.; Glynn, R. J.; Avorn, J.; Stürmer, T., Variable selection for propensity score models, Am. J. Epidemiol., 163, 1149-1156, (2006)
[4] Cook, R. D., On the interpretation of regression plots, J. Amer. Statist. Assoc., 89, 177-189, (1994) · Zbl 0791.62066
[5] Cook, R. D., Graphics for regressions with a binary response, J. Amer. Statist. Assoc., 91, 983-992, (1996) · Zbl 0882.62060
[6] Cook, R. D., Testing predictor contributions in sufficient dimension reduction, Ann. Statist., 32, 1061-1092, (2004) · Zbl 1092.62046
[7] de Luna, X., Waernbaum, I., 2005. Covariate Selection for non-Parametric Estimation of Treatment Effects. IFAU Working Paper, Institute for Labour Market Policy Evaluation, Uppsala.
[8] de Luna, X.; Waernbaum, I.; Richardson, T. S., Covariate selection for the nonparametric estimation of an average treatment effect, Biometrika, 98, 861-875, (2011) · Zbl 1228.62139
[9] Friedman, J.; Hastie, T.; Tibshirani, R., Regularization paths for generalized linear models via coordinate descent, J. Stat. Softw., 33, 1-22, (2010)
[10] Greenland, S.; Pearl, J.; Robins, J. M., Causal diagrams for epidemiologic research, Epidemiology, 1, 37-48, (1999)
[11] Häggström, J.; Persson, E.; Waernbaum, I.; de Luna, X., Covsel: an R package for covariate selection when estimating average causal effects, J. Stat. Softw., 68, 1-20, (2015)
[12] Hahn, J., Functional restriction and efficiency in causal inference, Rev. Econ. Stat., 1, 73-76, (2004)
[13] Hall, P.; Li, Q.; Racine, J. S., Nonparametric estimation of regression functions in the presence of irrelevant regressors, Rev. Econ. Stat., 89, 784-789, (2007)
[14] Hall, P.; Racine, J.; Li, Q., Cross-validation and the estimation of conditional probability densities, J. Amer. Statist. Assoc., 99, 1015-1026, (2004) · Zbl 1055.62035
[15] Hill, J. L., Bayesian nonparametric modeling for causal inference, J. Comput. Graph. Statist., 20, 217-240, (2012)
[16] Hjort, N. L.; Claeskens, G., Frequentist model average estimators, J. Amer. Statist. Assoc., 98, 879-899, (2003) · Zbl 1047.62003
[17] Imbens, G. W., The role of the propensity score in estimating dose-response functions, Biometrika, 87, 706-710, (2000) · Zbl 1120.62334
[18] Imbens, G. W.; Wooldridge, J. M., Recent developments in the econometrics of program evaluation, J. Econ. Lit., 1, 5-86, (2009)
[19] Kelcey, B., Covariate selection in propensity scores using outcome proxies, Multivariate Behav. Res., 46, 453-476, (2011)
[20] Langenskiöld, S.; Rubin, D. B., Outcome-free design of observational studies: peer influence on smoking, Ann. Économ. Statist., 107-125, (2008)
[21] Li, K., Sliced inverse regression for dimension reduction, J. Amer. Statist. Assoc., 86, 316-327, (1991) · Zbl 0742.62044
[22] Li, L.; Cook, R. D.; Nachtsheim, C. J., Model-free variable selection, J. R. Stat. Soc. Ser. B, 67, 285-299, (2005) · Zbl 1069.62053
[23] Li, C.; Ouyang, D.; Racine, J. S., Nonparametric regression with weakly dependent data: the discrete and continuous regressor case, J. Nonparametr. Stat., 21, 697-711, (2009) · Zbl 1167.62035
[24] Li, Q.; Racine, J. S.; Wooldridge, J. M., Efficient estimation of average treatment effects with mixed categorical and continuous data, J. Bus. Econom. Statist., 27, 206-223, (2009)
[25] Li, L.; Yin, X., Sliced inverse regression with regularizations, Biometrics, 64, 124-131, (2008) · Zbl 1139.62055
[26] Lunceford, J. K.; Davidian, M., Stratification and weighting via the propensity score in estimation of causal treatment effects: a comparative study, Stat. Med., 23, 2937-2960, (2004)
[27] McCaffrey, D. F.; Ridgeway, G.; Morral, A. R., Propensity score estimation with boosted regression for evaluating causal effects in observational studies, Psychol. Methods, 9, 403-425, (2004)
[28] Neyman, J., On the application of probability theory to agricultural experiments, essay on principles, Rocz. Nauk Roln., X, 1-51, (1923), In Polish. English translation by D.M. Dabrowska and T.P. Speed in Statistical Science 5, 465-472 · Zbl 0955.01560
[29] Pearl, J., Causality, (2009), Cambridge University Press Cambridge · Zbl 1188.68291
[30] R Development Core Team, 2010. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing. Vienna, Austria. URL: http://www.R-project.org/ ISBN 3-900051-07-0.
[31] Robins, J. M., Causal inference from complex longitudinal data, (Berkane, M., Latent Variable Modeling and Applications to Causality, Lecture Notes in Statistics, vol. 120, (1997), Springer New York), 69-117 · Zbl 0969.62072
[32] Rosenbaum, P. R.; Rubin, D. B., The central role of the propensity score in observational studies for causal effects, Biometrika, 70, 41-55, (1983) · Zbl 0522.62091
[33] Rubin, D. B., Estimating causal effects of treatments in randomized and nonrandomized studies, J. Educ. Psychol., 66, 688-701, (1974)
[34] Rubin, D. B., Assignment to treatment group on the basis of a covariate, J. Educ. Stat., 2, 1-26, (1977)
[35] Rubin, D. B., Estimating causal effects from large data sets using propensity scores, Ann. Intern. Med., 127, 757-763, (1997)
[36] Shao, Y.; Cook, R.; Weisberg, S., Marginal tests with sliced average variance estimation, Biometrika, 94, 285-296, (2007) · Zbl 1133.62032
[37] van der Laan, M. J.; Gruber, S., Collaborative double robust targeted maximum likelihood estimation, Int. J. Biostat., 6, 1-68, (2010)
[38] VanderWeele, T. J.; Shpitser, I., A new criterion for confounder selection, Biometrics, 67, 1406-1413, (2011) · Zbl 1274.62890
[39] Vansteelandt, S.; Bekaert, M.; Claeskens, G., On model selection and model misspecification in causal inference, Stat. Methods Med. Res., 21, 7-30, (2012) · Zbl 1365.62431
[40] Venables, W. N.; Ripley, B. D., Modern applied statistics with S, (2002), Springer New York · Zbl 1006.62003
[41] White, H.; Lu, X., Causal diagrams for treatment effect estimation with application to efficient covariate selection, Rev. Econ. Stat., 93, 1453-1459, (2011)
[42] Zigler, C. M.; Dominici, F., Uncertainty in propensity score estimation: Bayesian methods for variable selection and model-averaged causal effects, J. Amer. Statist. Assoc., 109, 95-107, (2014)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.