×

Dummy endogenous treatment effect estimation using high-dimensional instrumental variables. (English. French summary) Zbl 1534.62094

Summary: We develop a two-stage approach to estimate the treatment effects of dummy endogenous variables using high-dimensional instrumental variables (IVs). In the first stage, instead of using a conventional linear reduced-form regression to approximate the optimal instrument, we propose a penalized logistic reduced-form model to accommodate both the binary nature of the endogenous treatment variable and the high dimensionality of the IVs. In the second stage, we replace the original treatment variable with its estimated propensity score and run a least-squares regression to obtain a penalized logistic regression instrumental variables estimator (LIVE). We show theoretically that the proposed LIVE is root-\(n\) consistent with the true treatment effect and asymptotically normal. Monte Carlo simulations demonstrate that LIVE is more efficient than existing IV estimators for endogenous treatment effects. In applications, we use LIVE to investigate whether the Olympic Games facilitate the host nation’s economic growth and whether home visits from teachers enhance students’ academic performance. In addition, the R functions for the proposed algorithms have been developed in an R package naivereg.
{© 2021 Statistical Society of Canada.}

MSC:

62J07 Ridge regression; shrinkage estimators (Lasso)
62H12 Estimation in multivariate analysis
62J12 Generalized linear models (logistic models)

Software:

R; naivereg
Full Text: DOI

References:

[1] Amemiya, T. (1974). The non‐linear two‐stage least squares estimator. Journal of Econometrics, 2, 105-110. · Zbl 0282.62089
[2] Angrist, J. D. & Imbens, G. W. (1995). Two‐stage least squares estimation of average causal effects in models with variable treatment intensity. Journal of the American Statistical Association, 90, 431-442. · Zbl 0925.62541
[3] Baade, R. & Matheson, V. (2016). Going for the gold: The economics of the Olympics. Journal of Economic Perspectives, 30, 201-218.
[4] Bai, J. & Ng, S. (2010). Instrumental variable estimation in a data rich environment. Econometric Theory, 26, 1577-1606. · Zbl 1230.62148
[5] Bazzi, S. & Clemens, M. (2013). Instruments: Avoiding common pitfalls in identifying the causes of economic growth. American Economic Journal: Macroeconomics, 5, 152-186.
[6] Belloni, A., Chen, D., Chernozhukov, V., & Hansen, C. (2012). Sparse models and methods for optimal instruments with an application to eminent domain. Econometrica, 80, 2369-2429. · Zbl 1274.62464
[7] Belloni, A., Chernozhukov, V., & Hansen, C. (2014). Inference on treatment effects after selection amongst high‐dimensional controls. The Review of Economic Studies, 81, 608-650. · Zbl 1409.62142
[8] Bruckner, M. & Pappa, E. (2015). News shocks in the data: Olympic games and their macroeconomic effects. Journal of Money, Credit and Banking, 47, 1339-1367.
[9] Cai, Z., Das, M., Xiong, H., & Wu, X. (2006). Functional coefficient instrumental variables models. Journal of Econometrics, 133, 207-241. · Zbl 1345.62059
[10] Candes, E. & Tao, T. (2007). The Dantzig selector: Statistical estimation when p is much larger than n. The Annals of Statistics, 35, 2313-2351. · Zbl 1139.62019
[11] Caner, M. & Fan, Q. (2015). Hybrid generalized empirical likelihood estimators: Instrument selection with adaptive LASSO. Journal of Econometrics, 187, 256-274. · Zbl 1337.62167
[12] Carrasco, M. (2012). A regularization approach to the many instruments problem. Journal of Econometrics, 170, 383-398. · Zbl 1443.62071
[13] Castro, M., Expósito‐Casas, E., López‐Martín, E., Lizasoain, L., Navarro‐Asencio, E., & Gaviria, J. L. (2015). Parental involvement on student academic achievement: A meta‐analysis. Educational Research Review, 14, 33-46.
[14] Coates, D. (2007). Stadiums and arenas: Economic development or economic redistribution?. Contemporary Economic Policy, 25, 565-577.
[15] Das, M. (2005). Instrumental variables estimators of nonparametric models with discrete endogenous regressors. Journal of Econometrics, 124, 335-361. · Zbl 1334.62034
[16] De la Peña, V. H., Lai, T., & Shao, Q.Self‐normalized processes: Limit theory and statistical applications. In Probability and its Applications. Springer‐Verlag: Berlin; 2009. · Zbl 1165.62071
[17] Dohl, G. & Lochner, L. (2012). The impact of family income on child achievement: Evidence from the earned income tax credit. American Economic Review, 102, 1927-1956.
[18] Donald, S. G. & Newey, W. K. (2001). Choosing the number of instruments. Econometrica, 69, 1161-1191. · Zbl 1021.91047
[19] Fan, J. & Li, R. (2001). Variable selection via nonconcave penalized likelihood and it oracle properties. Journal of the American Statistical Association, 96, 1348-1360. · Zbl 1073.62547
[20] Fan, J. & Liao, Y. (2014). Endogeneity in high dimensions. The Annals of Statistics, 42, 872-917. · Zbl 1305.62113
[21] Fan, J. & Peng, H. (2004). Nonconcave penalized likelihood with a diverging number of parameters. The Annals of Statistics, 32, 928-961. · Zbl 1092.62031
[22] Fan, J. & Song, R. (2010). Sure independence screening in generalized linear models with NP‐dimensionality. The Annals of Statistics, 38, 3567-3604. · Zbl 1206.68157
[23] Fan, Q. & Zhong, W. (2018). Nonparametric additive instrumental variable estimator: A group shrinkage estimation perspective. Journal of Business and Economic Statistics, 36, 388-399.
[24] Farrell, M. H. (2015). Robust inference on average treatment effects with possibly more covariates than observations. Journal of Econometrics, 189, 1-23. · Zbl 1337.62113
[25] Gautier, E. & Tsybakov, A. B.2018. High‐dimensional instrumental variables regression and confidence sets, https://arxiv.org/abs/1812.11330.
[26] Hansen, C. & Kozbur, D. (2014). Instrumental variables estimation with many weak instruments using regularized JIVE. Journal of Econometrics, 182, 290-308. · Zbl 1311.62097
[27] Heckman, J. J. (1978). Dummy endogenous variables in a simultaneous equation system. Econometrica, 46, 931-959. · Zbl 0382.62095
[28] Huang, J., Horowitz, J., & Wei, F. (2010). Variable selection in nonparametric additive models. The Annals of Statistics, 38, 2282-2313. · Zbl 1202.62051
[29] Jing, B., Shao, Q., & Wang, Q. (2003). Self‐normalized Cramer‐type large deviations for independent random variables. Annals of Probability, 31, 2167-2215. · Zbl 1051.60031
[30] Kang, H., Zhang, A., Cai, T. T., & Small, D. S. (2016). Instrumental variables estimation with some invalid instruments and its application to Mendelian randomization. Journal of the American Statistical Association, 111, 132-144.
[31] Kloek, T. & Mennes, L. B. M. (1960). Simultaneous equations estimation based on principal components of predetermined variables. Econometrica, 28, 45-61. · Zbl 0090.36504
[32] Lin, W., Feng, R., & Li, H. (2015). Regularization methods for high‐dimensional instrumental variables regression with an application to genetical genomics. Journal of the American Statistical Association, 110, 270-288. · Zbl 1373.62371
[33] Mai, Q. & Zou, H. (2013). The Kolmogorov filter for variable screening in high‐dimensional binary classification. Biometrika, 100, 229-234. · Zbl 1452.62456
[34] Newey, W. (1990). Efficient instrumental variables estimation of nonlinear models. Econometrica, 58, 809-837. · Zbl 0728.62107
[35] Owen, J. G. (2005). Estimating the cost and benefit of hosting Olympic games. The Industrial Geographer, 1, 1-18.
[36] Rose, A. K. & Spiegel, M. M. (2011). The Olympic effect. Economic Journal, 121, 652-677.
[37] Tibshirani, R. (1996). Regression shrinkage and selection via LASSO. Journal of the Royal Statistical Society, Series B, 58, 267-288. · Zbl 0850.62538
[38] Wang, H., Li, R., & Tsai, C. (2007). Tuning parameter selectors for the smoothly clipped absolute deviation method. Biometrika, 94, 553-568. · Zbl 1135.62058
[39] Windmeijer, F., Farbmacher, H., Davies, N., & Smith, G. D. (2019). On the use of the LASSO for instrumental variables estimation with some invalid instruments. Journal of the American Statistical Association, 114, 1339-1350. · Zbl 1428.62167
[40] Wooldridge, J. M. (2010). Econometric Analysis of Cross Section and Panel Data. The MIT Press, Cambridge. · Zbl 1327.62009
[41] Wooldridge, J. M. (2014). Quasi‐maximum likelihood estimation and testing for nonlinear models with endogenous explanatory variables. Journal of Econometrics, 182, 226-234. · Zbl 1311.62033
[42] Yuan, M. & Lin, Y. (2006). Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society, Series B, 68, 49-67. · Zbl 1141.62030
[43] Zhang, C. & Huang, J. (2008). The sparsity and bias of the LASSO selection in high‐dimensional linear regression. The Annals of Statistics, 36, 1567-1594. · Zbl 1142.62044
[44] Zhong, W., Gao, Y., Zhou, W., & Fan, Q. (2021). Endogenous treatment effect estimation using high‐dimensional instruments and double selection. Statistics and Probability Letters, 169, 108967. · Zbl 1456.62032
[45] Zimbalist, A. (2015). Circus Maximus: The Economic Gamble Behind Hosting the Olympics and the World Cup. Brookings Institution Press, Washington, DC.
[46] Zou, H. (2006). The adaptive LASSO and its oracle properties. Journal of the American Statistical Association, 101, 1418-1429. · Zbl 1171.62326
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.