×

Lasso-driven inference in time and space. (English) Zbl 1475.62208

Summary: We consider the estimation and inference in a system of high-dimensional regression equations allowing for temporal and cross-sectional dependency in covariates and error processes, covering rather general forms of weak temporal dependence. A sequence of regressions with many regressors using LASSO (Least Absolute Shrinkage and Selection Operator) is applied for variable selection purpose, and an overall penalty level is carefully chosen by a block multiplier bootstrap procedure to account for multiplicity of the equations and dependencies in the data. Correspondingly, oracle properties with a jointly selected tuning parameter are derived. We further provide high-quality de-biased simultaneous inference on the many target parameters of the system. We provide bootstrap consistency results of the test procedure, which are based on a general Bahadur representation for the \(Z\)-estimators with dependent data. Simulations demonstrate good performance of the proposed inference procedure. Finally, we apply the method to quantify spillover effects of textual sentiment indices in a financial market and to test the connectedness among sectors.

MSC:

62J07 Ridge regression; shrinkage estimators (Lasso)
62M10 Time series, auto-correlation, regression, etc. in statistics (GARCH)
62F40 Bootstrap, jackknife and other resampling methods
62P05 Applications of statistics to actuarial sciences and financial mathematics
60G46 Martingales and classical analysis

Software:

hdi

References:

[1] Andrews, D. W. K. (1984). Nonstrong mixing autoregressive processes. J. Appl. Probab. 21 930-934. · Zbl 0552.60049 · doi:10.2307/3213710
[2] Antweiler, W. and Frank, M. Z. (2004). Is all that talk just noise? The information content of Internet stock message boards. J. Finance 59 1259-1294.
[3] Audrino, F. and Tetereva, A. (2019). Sentiment spillover effects for us and European companies. J. Bank. Financ. 106 542-567.
[4] Baker, M. and Wurgler, J. (2006). Investor sentiment and the cross-section of stock returns. J. Finance 61 1645-1680.
[5] Basu, S. and Michailidis, G. (2015). Regularized estimation in sparse high-dimensional time series models. Ann. Statist. 43 1535-1567. · Zbl 1317.62067 · doi:10.1214/15-AOS1315
[6] Belloni, A., Chen, M. and Chernozhukov, V. (2016). Quantile graphical models: Prediction and conditional independence with applications to financial risk management. Preprint. Available at arXiv:1607.00286.
[7] Belloni, A. and Chernozhukov, V. (2013). Least squares after model selection in high-dimensional sparse models. Bernoulli 19 521-547. · Zbl 1456.62066 · doi:10.3150/11-BEJ410
[8] Belloni, A., Chernozhukov, V. and Hansen, C. (2011). Inference for high-dimensional sparse econometric models. Preprint. Available at arXiv:1201.0220. · Zbl 1209.62064
[9] Belloni, A., Chernozhukov, V. and Hansen, C. (2014). Inference on treatment effects after selection among high-dimensional controls. Rev. Econ. Stud. 81 608-650. · Zbl 1409.62142 · doi:10.1093/restud/rdt044
[10] Belloni, A., Chernozhukov, V. and Kato, K. (2015a). Supplement material for “Uniform post selection inference for least absolute deviation regression and other \(Z\)-estimation problems.” Available at Biometrika online. · Zbl 1345.62049
[11] Belloni, A., Chernozhukov, V. and Kato, K. (2015b). Uniform post-selection inference for least absolute deviation regression and other Z-estimation problems. Biometrika 102 77-94. · Zbl 1345.62049 · doi:10.1093/biomet/asu056
[12] Bickel, P. J., Ritov, Y. and Tsybakov, A. B. (2009). Simultaneous analysis of lasso and Dantzig selector. Ann. Statist. 37 1705-1732. · Zbl 1173.62022 · doi:10.1214/08-AOS620
[13] Chen, C. Y.-H., Härdle, W. K. and Okhrin, Y. (2019). Tail event driven networks of SIFIs. J. Econometrics 208 282-298. · Zbl 1452.62749 · doi:10.1016/j.jeconom.2018.09.016
[14] Chernozhukov, V., Chetverikov, D. and Kato, K. (2013). Gaussian approximations and multiplier bootstrap for maxima of sums of high-dimensional random vectors. Ann. Statist. 41 2786-2819. · Zbl 1292.62030 · doi:10.1214/13-AOS1161
[15] Chernozhukov, V., Chetverikov, D. and Kato, K. (2014). Gaussian approximation of suprema of empirical processes. Ann. Statist. 42 1564-1597. · Zbl 1317.60038 · doi:10.1214/14-AOS1230
[16] Chernozhukov, V., Chetverikov, D. and Kato, K. (2019). Inference on causal and structural parameters using many moment inequalities. Rev. Econ. Stud. 86 1867-1900. · Zbl 07613855 · doi:10.1093/restud/rdy065
[17] Chernozhukov, V. and Hansen, C. (2008). Instrumental variable quantile regression: A robust inference approach. J. Econometrics 142 379-398. · Zbl 1418.62154 · doi:10.1016/j.jeconom.2007.06.005
[18] Chernozhukov, V., Chetverikov, D., Demirer, M., Duflo, E., Hansen, C., Newey, W. and Robins, J. (2018). Double/debiased machine learning for treatment and structural parameters. Econom. J. 21 C1-C68. · Zbl 07565928 · doi:10.1111/ectj.12097
[19] Chernozhukov, V., Karl Härdle, W., Huang, C. and Wang, W. (2021). Supplement to “LASSO-driven inference in time and space.” https://doi.org/10.1214/20-AOS2019SUPP.
[20] Dezeure, R., Bühlmann, P. and Zhang, C.-H. (2017). High-dimensional simultaneous inference with the bootstrap. TEST 26 685-719. · Zbl 06833591 · doi:10.1007/s11749-017-0554-2
[21] Dimitrakopoulou, K., Tsimpouris, C., Papadopoulos, G., Pommerenke, C., Wilk, E., Sgarbas, K. N., Schughart, K. and Bezerianos, A. (2011). Dynamic gene network reconstruction from gene expression data in mice after influenza A (H1N1) infection. J. Clin. Bioinformat. 1 27. · doi:10.1186/2043-9113-1-27
[22] Epskamp, S., Waldorp, L. J., Mõttus, R. and Borsboom, D. (2018). The Gaussian graphical model in cross-sectional and time-series data. Multivar. Behav. Res. 53 453-480. · doi:10.1080/00273171.2018.1454823
[23] Garman, M. B. and Klass, M. J. (1980). On the estimation of security price volatilities from historical data. J. Bus. 53 67-78.
[24] Härdle, W. K., Wang, W. and Yu, L. (2016). TENET: Tail-Event driven NETwork risk. J. Econometrics 192 499-513. · Zbl 1420.62443 · doi:10.1016/j.jeconom.2016.02.013
[25] Härdle, W. K., Chen, S., Liang, C. and Schienle, M. (2018). Time-varying limit order book networks. IRTG 1792 Discussion Paper 2018-016, IRTG 1792, Humboldt Universität zu Berlin, Germany.
[26] Hautsch, N., Schaumburg, J. and Schienle, M. (2015). Financial network systemic risk contributions. Review of Finance 19 685-738. · Zbl 1417.91560
[27] Hu, M. and Liu, B. (2004). Mining and summarizing customer reviews. In Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 168-177.
[28] Huang, D., Yin, J., Shi, T. and Wang, H. (2016). A statistical model for social network labeling. J. Bus. Econom. Statist. 34 368-374. · doi:10.1080/07350015.2015.1039014
[29] Javanmard, A. and Montanari, A. (2014). Hypothesis testing in high-dimensional regression under the Gaussian random design model: Asymptotic theory. IEEE Trans. Inf. Theory 60 6522-6554. · Zbl 1360.62074 · doi:10.1109/TIT.2014.2343629
[30] Kock, A. B. and Callot, L. (2015). Oracle inequalities for high dimensional vector autoregressions. J. Econometrics 186 325-344. · Zbl 1331.62348 · doi:10.1016/j.jeconom.2015.02.013
[31] Kolaczyk, E. D. and Csárdi, G. (2014). Statistical Analysis of Network Data with R. Use R! Springer, New York. · Zbl 1290.62002 · doi:10.1007/978-1-4939-0983-4
[32] Kosorok, M. R. (2008). Introduction to Empirical Processes and Semiparametric Inference. Springer Series in Statistics. Springer, New York. · Zbl 1180.62137 · doi:10.1007/978-0-387-74978-5
[33] Krampe, J. Kreiss, J.-P. and Paparoditis, E. (2018). Bootstrap based inference for sparse high-dimensional time series models. Preprint. Available at arXiv:1806.11083. · Zbl 1476.62186
[34] Lahiri, S. N. (1999). Theoretical comparisons of block bootstrap methods. Ann. Statist. 27 386-404. · Zbl 0945.62049 · doi:10.1214/aos/1018031117
[35] Lin, J. and Michailidis, G. (2017). Regularized estimation and testing for high-dimensional multi-block vector-autoregressive models. J. Mach. Learn. Res. 18 Paper No. 117, 49. · Zbl 1442.62122 · doi:10.1631/jzus.a1500279
[36] Loughran, T. and McDonald, B. (2011). When is a liability not a liability? Textual analysis, dictionaries, and 10-Ks. J. Finance 66 35-65.
[37] Lütkepohl, H. (2005). New Introduction to Multiple Time Series Analysis. Springer, Berlin. · Zbl 1072.62075 · doi:10.1007/978-3-540-27752-1
[38] Manresa, E. (2013). Estimating the structure of social interactions using panel data. CEMFI, Madrid. Unpublished manuscript.
[39] Meinshausen, N. and Bühlmann, P. (2006). High-dimensional graphs and variable selection with the lasso. Ann. Statist. 34 1436-1462. · Zbl 1113.62082 · doi:10.1214/009053606000000281
[40] Neykov, M., Ning, Y., Liu, J. S. and Liu, H. (2018). A unified theory of confidence regions and testing for high-dimensional estimating equations. Statist. Sci. 33 427-443. · Zbl 1403.62101 · doi:10.1214/18-STS661
[41] Opgen-Rhein, R. and Strimmer, K. (2007). From correlation to causation networks: A simple approximate learning algorithm and its application to high-dimensional plant gene expression data. BMC Syst. Biol. 1 37. · Zbl 1166.62361
[42] Pesaran, M. H. and Yamagata, T. (2017). Testing for alpha in linear factor pricing models with a large number of securities. USC-INET Research Paper No. 17-13, USC Dornsife Institute for New Economic Thinking.
[43] Ramirez, R. N., El-Ali, N. C., Mager, M. A., Wyman, D., Conesa, A. and Mortazavi, A. (2017). Dynamic gene regulatory networks of human myeloid differentiation. Cell Systems 4 416-429.
[44] Romano, J. P. and Wolf, M. (2005). Exact and approximate stepdown methods for multiple hypothesis testing. J. Amer. Statist. Assoc. 100 94-108. · Zbl 1117.62416 · doi:10.1198/016214504000000539
[45] Tetlock, P. C. (2007). Giving content to investor sentiment: The role of media in the stock market. J. Finance 62 1139-1168.
[46] van de Geer, S., Bühlmann, P., Ritov, Y. and Dezeure, R. (2014). On asymptotically optimal confidence regions and tests for high-dimensional models. Ann. Statist. 42 1166-1202. · Zbl 1305.62259 · doi:10.1214/14-AOS1221
[47] van der Vaart, A. W. and Wellner, J. A. (1996). Weak Convergence and Empirical Processes. Springer Series in Statistics. Springer, New York. · Zbl 0862.60002 · doi:10.1007/978-1-4757-2545-2
[48] Wu, W.-B. and Wu, Y. N. (2016). Performance bounds for parameter estimates of high-dimensional linear models with correlated errors. Electron. J. Stat. 10 352-379. · Zbl 1333.62172 · doi:10.1214/16-EJS1108
[49] Yuan, M. and Lin, Y. (2007). Model selection and estimation in the Gaussian graphical model. Biometrika 94 19-35. · Zbl 1142.62408 · doi:10.1093/biomet/asm018
[50] Zhang, X. and Cheng, G. (2017). Simultaneous inference for high-dimensional linear models. J. Amer. Statist. Assoc. 112 757-768. · doi:10.1080/01621459.2016.1166114
[51] Zhang, D. and Wu, W. B. (2017a). Gaussian approximation for high dimensional time series. Ann. Statist. 45 1895-1919. · Zbl 1381.62254 · doi:10.1214/16-AOS1512
[52] Zhang, D. and Wu, W. B. (2017b). Supplement material for “Gaussian approximation for high dimensional time series. Available at Ann. Statist. online. · doi:10.1214/16-AOS1512SUPP
[53] Zhang, C.-H. and Zhang, S. S. (2014). Confidence intervals for low dimensional parameters in high dimensional linear models. J. R. Stat. Soc. Ser. B. Stat. Methodol. 76 217-242. · Zbl 1411.62196 · doi:10.1111/rssb.12026
[54] Zhang, J. L., Härdle, W. K., Chen, C. Y. and Bommes, E. (2016). Distillation of news flow into analysis of stock reactions. J. Bus. Econom. Statist. 34 547-563. · doi:10.1080/07350015.2015.1110525
[55] Zhu, Y. and Bradic, J. (2018). Linear hypothesis testing in dense high-dimensional linear models. J. Amer. Statist. Assoc. 113 1583-1600. · Zbl 1409.62139 · doi:10.1080/01621459.2017.1356319
[56] Zhu, X., Pan, R., Li, G., Liu, Y. and Wang, H. (2017). Network vector autoregression. Ann. Statist. 45 1096-1123. · Zbl 1381.62256 · doi:10.1214/16-AOS1476
[57] Zhu, X., Wang, W., Wang, H. and Härdle, W. K. (2019). Network quantile autoregression. J. Econometrics 212 345-358 · Zbl 1452.62688 · doi:10.1016/j.jeconom.2019.04.034
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.