×

The pitfall of instrumental variables in big data: what the rule of thumb can’t give you. (English) Zbl 07551942

Summary: Background: Instrumental variables (IVs) have become much easier to find in the “Big data era” which has increased the number of applications of the Two-Stage Least Squares model (TSLS). With the increased availability of IVs, the possibility that these IVs are weak has increased. Prior work has suggested a ‘rule of thumb’ that IVs with a first stage F statistic at least ten will avoid a relative bias in point estimates greater than 10%. We investigated whether or not this threshold was also an efficient guarantee of low false rejection rates of the null hypothesis test in TSLS applications with many IVs.
Objective: To test how the ‘rule of thumb’ for weak instruments performs in predicting low false rejection rates in the TSLS model when the number of IVs is large.
Method: We used a Monte Carlo approach to create 28 original data sets for different models with the number of IVs varying from 3 to 30. For each model, we generated 2000 observations for each iteration and conducted 50,000 iterations to reach convergence in rejection rates. The point estimate was set to 0, and probabilities of rejecting this hypothesis were recorded for each model as a measurement of false rejection rate. The relationship between the endogenous variable and IVs was carefully adjusted to let the F statistics for the first stage model equal ten, thus simulating the ‘rule of thumb.’
Results: We found that the false rejection rates (type I errors) increased when the number of IVs in the TSLS model increased while holding the F statistics for the first stage model equal to 10. The false rejection rate exceeds 10% when TLSL has 24 IVs and exceed 15% when TLSL has 30 IVs.
Conclusion: When more instrumental variables were applied in the model, the ‘rule of thumb’ was no longer an efficient guarantee for good performance in hypothesis testing. A more restricted margin for F statistics is recommended to replace the ‘rule of thumb,’ especially when the number of instrumental variables is large.

MSC:

62-XX Statistics

Software:

rivtest
Full Text: DOI

References:

[1] Angrist, J. D.; Krueger, A. B., Does compulsory school attendance affect schooling and earnings?, 1991. The Quarterly Journal of Economics, 106, 4, 979-1014
[2] Blomquist, S.; Dahlberg, M., Small sample properties of LIML and jackknife IV estimators: experiments with weak instruments, 1999. Journal of Applied Econometrics, 14, 1, 69-88
[3] Davidson, R.; MacKinnon, J. G., Estimation and inference in econometrics, 1993. OUP Catalogue · Zbl 1009.62596
[4] Finlay, K.; Magnusson, L. M., Implementing weak-instrument robust tests for a general class of instrumental-variables models, 2009. Stata Journal, 9, 3, 398
[5] Flores-Lagunes, A., Finite sample evidence of IV estimators under weak instruments, 2007. Journal of Applied Econometrics, 22, 3, 677-694
[6] Goldsmith, L. P., Psychological treatments for early psychosis can be beneficial or harmful, depending on the therapeutic alliance: An instrumental variable analysis, 2015. Psychological Medicine, 45, 11, 2365-2373
[7] Hansen, C.; Kozbur, D., Instrumental variables estimation with many weak instruments using regularized JIVE, 2014. Journal of Econometrics, 182, 2, 290-308 · Zbl 1311.62097
[8] Hollingsworth, J. M., Medical expulsive therapy versus early endoscopic stone removal for acute renal colic: an instrumental variable analysis, 2013. The Journal of Urology, 190, 3, 882-887
[9] Imbens, G. W.; Rosenbaum, P. R., Robust, accurate confidence intervals with a weak instrument: Quarter of birth and education, 2005. Journal of the Royal Statistical Society. Series A (Statistics in Society), 168, 1, 109-126 · Zbl 1101.62120
[10] Koch, S., Achieving Holistic Health for the Individual through Person-Centered Collaborative Care Supported by Informatics, 2013. Healthcare Informatics Research, 19, 1, 3-8
[11] Kowalski, A., Censored quantile instrumental variable estimates of the price elasticity of expenditure on medical care, 2016. Journal of Business & Economic Statistics, 34, 1, 107-117
[12] Martens, E. P., Instrumental variables: Application and limitations, 2006. Epidemiology, 17, 3, 260-267
[13] Michael, K.; Miller, K. W., Big data: New opportunities and new challenges [Guest editors’ introduction], 2013. Computer, 46, 6, 22-24
[14] Murray, M. P., Avoiding invalid instruments and coping with weak instruments, 2006. The Journal of Economic Perspectives, 20, 4, 111-132
[15] Small, D. S., Sensitivity analysis for instrumental variables regression with overidentifying restrictions, 2007. Journal of the American Statistical Association, 102, 479, 1049-1058 · Zbl 1333.62295
[16] Song, T. M.; Ryu, S., Big data analysis framework for healthcare and social sectors in Korea, 2015. Healthcare Informatics Research, 21, 1, 3-9
[17] Staiger, D.; Stock, J. H., Instrumental variables regression with weak instruments, 1997. Econometrica, 65, 3, 557-586 · Zbl 0871.62101
[18] Staiger, D.; Stock, J., Instrumental variables regression with weak instruments, 1997. Econometrica, 65, 3, 557-586 · Zbl 0871.62101
[19] Stock, J. H.; Trebbi, F., Retrospectives who invented instrumental variable regression?, 2003. The Journal of Economic Perspectives, 17, 3, 177-194
[20] Stock, J. H.; Yogo, M., 2002. Testing for weak instruments in linear IV regression, Mass, USA: National Bureau of Economic Research Cambridge, Mass, USA
[21] Stock, J. H.; Wright, J. H.; Yogo, M., A survey of weak instruments and weak identification in generalized method of moments, 2002. Journal of Business & Economic Statistics, 20, 4
[22] Stock, J. H.; Wright, J. H.; Yogo, M., A survey of weak instruments and weak identification in generalized method of moments, 2012. Journal of Business & Economic Statistics
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.