×

Two-sample instrumental variable analyses using heterogeneous samples. (English) Zbl 1420.62060

Summary: Instrumental variable analysis is a widely used method to estimate causal effects in the presence of unmeasured confounding. When the instruments, exposure and outcome are not measured in the same sample, J. D. Angrist and A. B. Krueger [“The effect of age at school entry on educational attainment: an application of instrumental variables with moments from two samples”, J. Am. Stat. Assoc. 87, No. 418, 328–336 (1992; doi:10.2307/2290263)] suggested to use two-sample instrumental variable (TSIV) estimators that use sample moments from an instrument-exposure sample and an instrument-outcome sample. However, this method is biased if the two samples are from heterogeneous populations so that the distributions of the instruments are different. In linear structural equation models, we derive a new class of TSIV estimators that are robust to heterogeneous samples under the key assumption that the structural relations in the two samples are the same. The widely used two-sample two-stage least squares estimator belongs to this class. It is generally not asymptotically efficient, although we find that it performs similarly to the optimal TSIV estimator in most practical situations. We then attempt to relax the linearity assumption. We find that, unlike one-sample analyses, the TSIV estimator is not robust to misspecified exposure model. Additionally, to nonparametrically identify the magnitude of the causal effect, the noise in the exposure must have the same distributions in the two samples. However, this assumption is in general untestable because the exposure is not observed in one sample. Nonetheless, we may still identify the sign of the causal effect in the absence of homogeneity of the noise.

MSC:

62D05 Sampling theory, sample surveys
62P10 Applications of statistics to biology and medical sciences; meta analysis

Software:

LDlink

References:

[1] Genomes Project Consortium (2015). A global reference for human genetic variation. Nature526 68-74.
[2] Abadie, A. (2003). Semiparametric instrumental variable estimation of treatment response models. J. Econometrics113 231-263. · Zbl 1038.62113 · doi:10.1016/S0304-4076(02)00201-4
[3] Anderson, T. W. and Rubin, H. (1949). Estimation of the parameters of a single equation in a complete system of stochastic equations. Ann. Math. Stat.20 46-63. · Zbl 0033.08002 · doi:10.1214/aoms/1177730090
[4] Angrist, J. D., Graddy, K. and Imbens, G. W. (2000). The interpretation of instrumental variables estimators in simultaneous equations models with an application to the demand for fish. Rev. Econ. Stud.67 499-527. · Zbl 1055.91519 · doi:10.1111/1467-937X.00141
[5] Angrist, J. D., Imbens, G. W. and Rubin, D. B. (1996). Identification of causal effects using instrumental variables. J. Amer. Statist. Assoc.91 444-455. · Zbl 0897.62130 · doi:10.1080/01621459.1996.10476902
[6] Angrist, J. D. and Krueger, A. B. (1992). The effect of age at school entry on educational attainment: An application of instrumental variables with moments from two samples. J. Amer. Statist. Assoc.87 328-336.
[7] Angrist, J. D. and Krueger, A. B. (1995). Split-sample instrumental variables estimates of the return to schooling. J. Bus. Econom. Statist.13 225-235.
[8] Baiocchi, M., Cheng, J. and Small, D. S. (2014). Instrumental variable methods for causal inference. Stat. Med.33 2297-2340.
[9] Baker, S. G. and Lindeman, K. S. (1994). The paired availability design: A proposal for evaluating epidural analgesia during labor. Stat. Med.13 2269-2278.
[10] Balke, A. and Pearl, J. (1997). Bounds on treatment effects from studies with imperfect compliance. J. Amer. Statist. Assoc.92 1171-1176. · Zbl 0888.62049 · doi:10.1080/01621459.1997.10474074
[11] Barbeira, A., Dickinson, S. P., Bonazzola, R., Zheng, J., Wheeler, H. E., Torres, J. M., Torstenson, E. S., Shah, K. P., Garcia, T. et al. (2018). Exploring the phenotypic consequences of tissue specific gene expression variation inferred from GWAS summary statistics. Nat. Commun.9 1825.
[12] Bowden, J., Davey Smith, G. and Burgess, S. (2015). Mendelian randomization with invalid instruments: Effect estimation and bias detection through Egger regression. Int. J. Epidemiol.44 512-525.
[13] Bowden, J., Davey Smith, G., Haycock, P. C. and Burgess, S. (2016). Consistent estimation in mendelian randomization with some invalid instruments using a weighted median estimator. Genet. Epidemiol.40 304-314.
[14] Buja, A., Berk, R., Brown, L., George, E., Pitkin, E., Traskin, M., Zhao, L. and Zhang, K. (2014). Models as approximations, part I: A conspiracy of nonlinearity and random regressors in linear regression. Statist. Sci. Available at arXiv:1404.1578. · Zbl 1440.62020
[15] Burgess, S., Small, D. S. and Thompson, S. G. (2017). A review of instrumental variable estimators for Mendelian randomization. Stat. Methods Med. Res.26 2333-2355.
[16] Burgess, S., Scott, R. A., Timpson, N. J., Smith, G. D., Thompson, S. G. and EPIC-InterAct Consortium (2015). Using published data in Mendelian randomization: A blueprint for efficient identification of causal risk factors. Eur. J. Epidemiol.30 543-552.
[17] Choi, J., Gu, J. and Shen, S. (2018). Weak-instrument robust inference for two-sample instrumental variables regression. J. Appl. Econometrics33 109-125.
[18] Currie, J. and Yelowitz, A. (2000). Are public housing projects good for kids? J. Public Econ.75 99-124.
[19] Davey Smith, G. and Ebrahim, S. (2003). “Mendelian randomization”: Can genetic epidemiology contribute to understanding environmental determinants of disease? Int. J. Epidemiol.32 1-22.
[20] Davey Smith, G. and Hemani, G. (2014). Mendelian randomization: Genetic anchors for causal inference in epidemiological studies. Hum. Mol. Genet.23 R89-98.
[21] Davidson, R. and MacKinnon, J. G. (1993). Estimation and Inference in Econometrics. Oxford University Press, New York. · Zbl 1009.62596
[22] Fuller, W. A. (1977). Some properties of a modification of the limited information estimator. Econometrica45 939-953. · Zbl 0387.62056 · doi:10.2307/1912683
[23] Gamazon, E. R., Wheeler, H. E., Shah, K. P., Mozaffari, S. V., Aquino-Michaels, K., Carroll, R. J., Eyler, A. E., Denny, J. C., Nicolae, D. L. et al. (2015). A gene-based association method for mapping traits using reference transcriptome data. Nat. Genet.47 1091-1098.
[24] Graham, B. S., Pinto, C. C. X. and Egel, D. (2016). Efficient estimation of data combination models by the method of auxiliary-to-study tilting (AST). J. Bus. Econom. Statist.34 288-301.
[25] Haavelmo, T. (1944). The probability approach in econometrics. Econometrica12 S1-S115. · Zbl 0063.01837 · doi:10.2307/1906935
[26] Hansen, L. P. (1982). Large sample properties of generalized method of moments estimators. Econometrica50 1029-1054. · Zbl 0502.62098 · doi:10.2307/1912775
[27] Hansen, C., Hausman, J. and Newey, W. (2008). Estimation with many instrumental variables. J. Bus. Econom. Statist.26 398-422.
[28] Hemani, G., Zheng, J., Elsworth, B., Wade, K. H., Haberland, V., Baird, D., Laurin, C., Burgess, S., Bowden, J. et al. (2018). The MR-Base platform supports systematic causal inference across the human phenome. eLife7 e34408.
[29] Hernán, M. A. and Robins, J. M. (2006). Instruments for causal inference: An epidemiologist’s dream? Epidemiology 360-372.
[30] Imbens, G. W. (2007). Nonadditive models with endogenous regressors. In Advances in Economics and Econometrics (R. Blundell, W. Newey and T. Persson, eds.) 3 17-46. Cambridge Univ. Press, Cambridge. · Zbl 1137.62087
[31] Imbens, G. and Angrist, J. (1994). Identification and estimation of local average treatment effects. Econometrica62 467-475. · Zbl 0800.90648 · doi:10.2307/2951620
[32] Inoue, A. and Solon, G. (2010). Two-sample instrumental variables estimators. Rev. Econ. Stat.92 557-561.
[33] Jappelli, T., Pischke, J.-S. and Souleles, N. S. (1998). Testing for liquidity constraints in Euler equations with complementary data sources. Rev. Econ. Stat.80 251-262.
[34] Kang, H., Zhang, A., Cai, T. T. and Small, D. S. (2016). Instrumental variables estimation with some invalid instruments and its application to Mendelian randomization. J. Amer. Statist. Assoc.111 132-144.
[35] Lawlor, D. A. (2016). Commentary: Two-sample Mendelian randomization: Opportunities and challenges. Int. J. Epidemiol.45 908-915.
[36] Lawlor, D. A., Harbord, R. M., Sterne, J. A. C., Timpson, N. and Smith, G. D. (2008). Mendelian randomization: Using genes as instruments for making causal inferences in epidemiology. Stat. Med.27 1133-1163.
[37] Locke, A. E., Kahali, B., Berndt, S. I., Justice, A. E., Pers, T. H., Day, F. R., Powell, C., Vedantam, S., Buchkovich, M. L. et al. (2015). Genetic studies of body mass index yield new insights for obesity biology. Nature518 197-206.
[38] Machiela, M. J. and Chanock, S. J. (2015). LDlink: A web-based application for exploring population-specific haplotype structure and linking correlated alleles of possible functional variants. Bioinformatics31 3555-3557.
[39] Manolio, T. A., Collins, F. S., Cox, N. J., Goldstein, D. B., Hindorff, L. A., Hunter, D. J., McCarthy, M. I., Ramos, E. M., Cardon, L. R. et al. (2009). Finding the missing heritability of complex diseases. Nature461 747-753.
[40] Ogburn, E. L., Rotnitzky, A. and Robins, J. M. (2015). Doubly robust estimation of the local average treatment effect curve. J. R. Stat. Soc. Ser. B. Stat. Methodol.77 373-396. · Zbl 1414.62114 · doi:10.1111/rssb.12078
[41] Pacini, D. (2018). The two-sample linear regression model with interval-censored covariates. J. Appl. Econometrics34 66-81.
[42] Pacini, D. and Windmeijer, F. (2016). Robust inference for the two-sample 2SLS estimator. Econom. Lett.146 50-54. · Zbl 1398.62183 · doi:10.1016/j.econlet.2016.06.033
[43] Pearl, J. (2009). Causality: Models, Reasoning, and Inference, 2nd ed. Cambridge Univ. Press, Cambridge. · Zbl 1188.68291
[44] Peters, J., Bühlmann, P. and Meinshausen, N. (2016). Causal inference by using invariant prediction: Identification and confidence intervals. J. R. Stat. Soc. Ser. B. Stat. Methodol.78 947-1012. · Zbl 1414.62297 · doi:10.1111/rssb.12167
[45] Pierce, B. L. and Burgess, S. (2013). Efficient design for Mendelian randomization studies: Subsample and 2-sample instrumental variable estimators. Am. J. Epidemiol.178 1177-1184.
[46] Ridder, G. and Moffitt, R. (2007). The econometrics of data combination. Handb. Econom.6 5469-5547.
[47] Sherry, S. T., Ward, M.-H., Kholodov, M., Baker, J., Phan, L., Smigielski, E. M. and Sirotkin, K. (2001). dbSNP: The NCBI database of genetic variation. Nucleic Acids Res.29 308-311.
[48] Stock, J. H., Wright, J. H. and Yogo, M. (2002). A survey of weak instruments and weak identification in generalized method of moments. J. Bus. Econom. Statist.20 518-529.
[49] Theil, H. (1958). Economic Forecasts and Policy. North-Holland, Amsterdam.
[50] Vansteelandt, S. and Didelez, V. (2015). Robustness and efficiency of covariate adjusted linear instrumental variable estimators. Preprint. Available at arXiv:1510.01770. · Zbl 1408.62098 · doi:10.1111/sjos.12329
[51] Wald, A. (1940). The fitting of straight lines if both variables are subject to error. Ann. Math. Stat.11 285-300. · Zbl 0023.34402 · doi:10.1214/aoms/1177731868
[52] Wang, L. and Tchetgen Tchetgen, E. (2018). Bounded, efficient and multiply robust estimation of average treatment effects using instrumental variables. J. R. Stat. Soc. Ser. B. Stat. Methodol.80 531-550. · Zbl 1398.62348 · doi:10.1111/rssb.12262
[53] White, H. (1980). Using least squares to approximate unknown regression functions. Internat. Econom. Rev.21 149-170. · Zbl 0444.62119 · doi:10.2307/2526245
[54] Wright, P. G. (1928). Tariff on Animal and Vegetable Oils. Macmillan, New York.
[55] Yang, J., Ferreira, T., Morris, A. P., Medland, S. E., Madden, P. A., Heath, A. C., Martin, N. G., Montgomery, G. W., Weedon, M. N. et al. (2012). Conditional and joint multiple-SNP analysis of GWAS summary statistics identifies additional variants influencing complex traits. Nat. Genet.44 369-375.
[56] Zhao, Q., Wang, J., Bowden, J. and Small, D. S. (2019). Statistical inference in two-sample summary-data mendelian randomization using robust adjusted profile score. Ann. Statist. To appear. Available at arXiv:1801.09652. · Zbl 1465.62050
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.