×

Doubly robust tests of exposure effects under high-dimensional confounding. (English) Zbl 1520.62184

Summary: After variable selection, standard inferential procedures for regression parameters may not be uniformly valid; there is no finite-sample size at which a standard test is guaranteed to approximately attain its nominal size. This problem is exacerbated in high-dimensional settings, where variable selection becomes unavoidable. This has prompted a flurry of activity in developing uniformly valid hypothesis tests for a low-dimensional regression parameter (eg, the causal effect of an exposure \(A\) on an outcome \(Y)\) in high-dimensional models. So far there has been limited focus on model misspecification, although this is inevitable in high-dimensional settings. We propose tests of the null that are uniformly valid under sparsity conditions weaker than those typically invoked in the literature, assuming working models for the exposure and outcome are both correctly specified. When one of the models is misspecified, by amending the procedure for estimating the nuisance parameters, our tests continue to be valid; hence, they are doubly robust. Our proposals are straightforward to implement using existing software for penalized maximum likelihood estimation and do not require sample splitting. We illustrate them in simulations and an analysis of data obtained from the Ghent University intensive care unit.
{© 2020 The International Biometric Society}

MSC:

62P10 Applications of statistics to biology and medical sciences; meta analysis

References:

[1] Avagyan, V. and Vansteelandt, S. (2017) Honest data‐adaptive inference for the average treatment effect under model misspecification using penalised bias‐reduced double‐robust estimation. Preprint, 2017, arXiv:1708.03787.
[2] Belloni, A., Chernozhukov, V. and Hansen, C. (2014) Inference on treatment effects after selection among high‐dimensional controls. The Review of Economic Studies, 81(2), 608-650. · Zbl 1409.62142
[3] Belloni, A., Chernozhukov, V. and Wei, Y. (2016) Post‐selection inference for generalized linear models with many controls. Journal of Business & Economic Statistics, 34(4), 606-619.
[4] Chernozhukov, V., Chetverikov, D., Demirer, M., Duflo, E., Hansen, C., Newey, W. and Robins, J. (2018) Double/debiased machine learning for treatment and structural parameters. The Econometrics Journal, 21(1), C1-C68. · Zbl 07565928
[5] Chernozhukov, V., Hansen, C. and Spindler, M. (2016) hdm: high‐dimensional metrics. The R Journal, 8(2), 185-199.
[6] Chetverikov, D., Liao, Z. and Chernozhukov, V. (2016) On cross‐validated Lasso. Preprint, 2016, arXiv: 1605.02214.
[7] Dukes, O. and Vansteelandt, S. (2019) How to obtain valid tests and confidence intervals after propensity score variable selection?Statistical Methods in Medical Research. https://doi.org/10.1177/0962280219862005 · doi:10.1177/0962280219862005
[8] Farrell, M.H. (2015) Robust inference on average treatment effects with possibly more covariates than observations. Journal of Econometrics, 189(1), 1-23. · Zbl 1337.62113
[9] Finfer, S., Chittock, D.R., Su, S.Y.‐S., Blair, D. and Foster, D. (2009) Intensive versus conventional glucose control in critically ill patients. The New England Journal of Medicine, 360(13), 1283-1297.
[10] van deGeer, S., Bühlmann, P., Ritov, Y. and Dezeure, R. (2014) On asymptotically optimal confidence regions and tests for high‐dimensional models. The Annals of Statistics, 42(3), 1166-1202. · Zbl 1305.62259
[11] van derLaan, M.J. and Rose, S. (2011) Targeted Learning. New York, NY: Springer.
[12] Leeb, H. and Pötscher, B.M. (2005) Model selection and inference: facts and fiction. Econometric Theory, 21(1), 21-59. · Zbl 1085.62004
[13] Ning, Y. and Liu, H. (2017) A general theory of hypothesis tests and confidence regions for sparse high dimensional models. The Annals of Statistics, 45(1), 158-195. · Zbl 1364.62128
[14] Robins, J.M. (1997) Causal inference from complex longitudinal data. In: Berkane, M. (ed.) (Ed.), Latent Variable Modeling and Applications to Causality. New York, NY: Springer, pp. 69-117. · Zbl 0969.62072
[15] Robins, J.M., Mark, S.D. and Newey, W.K. (1992) Estimating exposure effects by modelling the expectation of exposure conditional on confounders. Biometrics, 48(2), 479-495. · Zbl 0768.62099
[16] Robinson, P.M. (1988) Root‐N‐consistent semiparametric regression. Econometrica, 56(4), 931-954. · Zbl 0647.62100
[17] Shah, R.D. and Peters, J. (2019) The hardness of conditional independence testing and the generalised covariance measure. The Annals of Statistics. Preprint, 2018, arXiv: 1804.07203.
[18] Tan, Z. (2019) Model‐assisted inference for treatment effects using regularized calibrated estimation with high‐dimensional data. Annals of Statistics Preprint, 2018, arXiv:1801.09817.
[19] Tibshirani, R. (1996) Regression shrinkage and selection via the Lasso. Journal of the Royal Statistical Society. Series B (Methodological), 58(1), 267-288. · Zbl 0850.62538
[20] Van den Berghe, G., Wouters, P., Weekers, F., Verwaest, C., Bruyninckx, F., Schetz, M., Vlasselaers, D., Ferdinande, P., Lauwers, P. and Bouillon, R. (2001) Intensive insulin therapy in critically ill patients. New England Journal of Medicine, 345(19), 1359-1367.
[21] Vermeulen, K. and Vansteelandt, S. (2015) Bias‐reduced doubly robust estimation. Journal of the American Statistical Association, 110(511), 1024-1036. · Zbl 1373.62218
[22] Zhang, C.‐H. and Zhang, S.S. (2014) Confidence intervals for low dimensional parameters in high dimensional linear models. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 76(1), 217-242. · Zbl 1411.62196
[23] Zheng, W. and van derLaan, M.J. (2011) Cross‐validated targeted minimum‐loss‐based estimation. Targeted Learning. New York, NY: Springer, pp. 459-474.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.