×

Nonparametric C- and D-vine-based quantile regression. (English) Zbl 1480.62069

Summary: Quantile regression is a field with steadily growing importance in statistical modeling. It is a complementary method to linear regression, since computing a range of conditional quantile functions provides more accurate modeling of the stochastic relationship among variables, especially in the tails. We introduce a nonrestrictive and highly flexible nonparametric quantile regression approach based on C- and D-vine copulas. Vine copulas allow for separate modeling of marginal distributions and the dependence structure in the data and can be expressed through a graphical structure consisting of a sequence of linked trees. This way, we obtain a quantile regression model that overcomes typical issues of quantile regression such as quantile crossings or collinearity, the need for transformations and interactions of variables. Our approach incorporates a two-step ahead ordering of variables, by maximizing the conditional log-likelihood of the tree sequence, while taking into account the next two tree levels. We show that the nonparametric conditional quantile estimator is consistent. The performance of the proposed methods is evaluated in both low- and high-dimensional settings using simulated and real-world data. The results support the superior prediction ability of the proposed models.

MSC:

62G08 Nonparametric regression and quantile regression
62G05 Nonparametric estimation
62H05 Characterization and structure theory for multivariate probability distributions; copulas

References:

[1] Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. In B. Petrov & F. Csáki (Eds.), Second international symposium on information theory (pp. 267-281). Budapest: Akadémiai Kiadó. · Zbl 0283.62006
[2] Athey, S., Tibshirani, J., & Wager, S. (2019). Generalized random forests. The Annals of Statistics, 47(2), 1148-1178. · Zbl 1418.62102
[3] Bartle, R. G., & Joichi, J. T. (1961). The preservation of convergence of measurable functions under composition. Proceedings of the American Mathematical Society, 12(1), 122-126. · Zbl 0097.04401
[4] Bartle, R. G., & Sherbert, D. R. (2000). Introduction to real analysis. New York: Wiley. · Zbl 0810.26001
[5] Bauer, A., & Czado, C. (2016). Pair-copula Bayesian networks. Journal of Computational and Graphical Statistics, 25(4), 1248-1271.
[6] Bedford, T., & Cooke, R. M. (2002). Vines-a new graphical model for dependent random variables. The Annals of Statistics, 30(4), 1031-1068. · Zbl 1101.62339
[7] Belloni, A., & Chernozhukov, V. (2011). ℓ1-penalized quantile regression in high-dimensional sparse models. The Annals of Statistics, 39(1), 82-130. · Zbl 1209.62064
[8] Bernard, C., & Czado, C. (2015). Conditional quantiles and tail dependence. Journal of Multivariate Analysis, 138, 104-126. · Zbl 1320.62164
[9] Breiman, L. (2001). Random forests, machine learning 45. Journal of Clinical Microbiology, 2(30), 199-228.
[10] Bühlmann, P., & van de Geer, S. (2011). Statistics for high-dimensional data: Methods, theory and applications. Berlin/Heidelberg, Germany: Springer Science & Business Media. · Zbl 1273.62015
[11] Chang, B., & Joe, H. (2019). Prediction based on conditional distributions of vine copulas. Computational Statistics & Data Analysis, 139, 45-63. · Zbl 1507.62025
[12] Charpentier, A., Fermanian, J.-D., & Scaillet, O. (2007). The estimation of copulas: Theory and practice. In J. Rank (Ed.), Copulas: From theory to application in finance (pp. 35-64). London: Risk Books.
[13] Chen, X., Koenker, R., & Xiao, Z. (2009). Copula-based nonlinear quantile autoregression. The Econometrics Journal, 12, S50-S67. · Zbl 1182.62175
[14] Cheng, C. (1995). Uniform consistency of generalized kernel estimators of quantile density. The Annals of Statistics, 23(6), 2285-2291. · Zbl 0853.62031
[15] Cheng, K.-F. (1984). On almost sure representation for quantiles of the product limit estimator with applications. Sankhyā: The Indian Journal of Statistics, Series A, 46, 426-443. · Zbl 0568.62036
[16] Claeskens, G., & Hjort, N. (2003). The focused information criterion. Journal of the American Statistical Association, 98, 900-916. With discussion and a rejoinder by the authors. · Zbl 1045.62003
[17] Czado, C. (2019). Analyzing dependent data with vine copulas: A practical guide with R. Lecture notes in statistics. Basel, Switzerland: Springer International Publishing. · Zbl 1425.62001
[18] Dua, D., & Graff, C. (2017). UCI machine learning repository. Irvine: University of California, School of Information and Computer Sciences.
[19] Durrett, R. (2010). Probability: Theory and examples. Cambridge, UK: Cambridge University Press. · Zbl 1202.60001
[20] Fenske, N., Kneib, T., & Hothorn, T. (2011). Identifying risk factors for severe childhood malnutrition by boosting additive quantile regression. Journal of the American Statistical Association, 106(494), 494-510. · Zbl 1232.62146
[21] Geenens, G. (2014). Probit transformation for kernel density estimation on the unit interval. Journal of the American Statistical Association, 109(505), 346-358. · Zbl 1367.62107
[22] Geenens, G., Charpentier, A., & Paindaveine, D. (2017). Probit transformation for nonparametric kernel estimation of the copula density. Bernoulli, 23(3), 1848-1873. · Zbl 1392.62101
[23] Gijbels, I., & Matterne, M. (2021). Study of partial and average conditional Kendall’s tau. Dependence Modeling, 9, 82-120. · Zbl 1476.62091
[24] Gijbels, I., & Mielniczuk, J. (1990). Estimating the density of a copula function. Communications in Statistics-Theory and Methods, 19(2), 445-464. · Zbl 0900.62188
[25] Grønneberg, T., & Raftery, A. E. (2007). Strictly proper scoring rules, prediction, and estimation. Journal of the American Statistical Association, 102(477), 359-378. · Zbl 1284.62093
[26] Gronneberg, S., & Hjort, N. L. (2014). The copula information criteria. Scandinavian Journal of Statistics, 41(2), 436-459. · Zbl 1416.62249
[27] Haff, I. H., Aas, K., & Frigessi, A. (2010). On the simplified pair-copula construction—simply useful or too simplistic? Journal of Multivariate Analysis, 101(5), 1296-1310. · Zbl 1184.62079
[28] Joe, H. (1996). Families of m-variate distributions with given margins and m(m−1)/2 bivariate dependence parameters. Lecture Notes-Monograph Series, 28, 120-141.
[29] Joe, H. (1997). Multivariate models and multivariate dependence concepts. Boca Raton, Florida, USA: CRC Press. · Zbl 0990.62517
[30] Joe, H. (2014). Dependence modeling with copulas. Boca Raton, Florida, USA: CRC Press. · Zbl 1346.62001
[31] Jullum, M., & Hjort, N. L. (2017). Parametric or nonparametric: The FIC approach. Statistica Sinica, 27(3), 951-981. · Zbl 1370.62012
[32] Ko, V., & Hjort, N. L. (2019). Copula information criterion for model selection with two-stage maximum likelihood estimation. Econometrics and Statistics, 12, 167-180.
[33] Ko, V., Hjort, N. L., & Hobæ Haff, I. (2019). Focused information criteria for copulas. Scandinavian Journal of Statistics, 46(4), 1117-1140. · Zbl 1443.62136
[34] Koenker, R. (2004). Quantile regression for longitudinal data. Journal of Multivariate Analysis, 91(1), 74-89. · Zbl 1051.62059
[35] Koenker, R. (2005a). Quantile regression. Cambridge, UK: Cambridge University Press. · Zbl 1111.62037
[36] Koenker, R. (2005b). Quantile regression. Econometric Society Monographs. Cambridge University Press. · Zbl 1111.62037
[37] Koenker, R. (2011). Additive models for quantile regression: Model selection and confidence bandaids. Brazilian Journal of Probability and Statistics, 25(3), 239-262. · Zbl 1236.62031
[38] Koenker, R., & Bassett, G. (1978). Regression quantiles. Econometrica: Journal of the Econometric Society, 46, 33-50. · Zbl 0373.62038
[39] Kolmogorov, A. N., & Fomin, S. (1970). Introductory real analysis. Hoboken, New Jersey, USA: Prentice-Hall. · Zbl 0213.07305
[40] Komunjer, I. (2013). Quantile Prediction. In Y. Ait-Sahalia & L. Peter Hansen (Eds.), Chapter 17 in Handbook of financial econometrics. Amsterdam, Netherlands: Elsevier.
[41] Kraus, D., & Czado, C. (2017). D-vine copula based quantile regression. Computational Statistics & Data Analysis, 110, 1-18. · Zbl 1466.62118
[42] Li, A. H., & Martin, A. (2017). Forest-type regression with general losses and robust forest. In International Conference on Machine Learning (pp. 2091-2100). PMLR.
[43] Li, Q., Lin, J., & Racine, J. S. (2013). Optimal bandwidth selection for nonparametric conditional distribution and quantile functions. Journal of Business & Economic Statistics, 31(1), 57-65.
[44] Meinshausen, N. (2006). Quantile regression forests. Journal of Machine Learning Research, 7(Jun), 983-999. · Zbl 1222.68262
[45] Nagler, T. (2018). kdecopula: An R package for the kernel estimation of bivariate copula densities. Journal of Statistical Software, 84(7), 1-22.
[46] Nagler, T. (2019). vinereg: D-Vine Quantile Regression. R package version 0.7.0.
[47] Nagler, T., Schellhase, C., & Czado, C. (2017). Nonparametric estimation of simplified vine copula models: Comparison of methods. Dependence Modeling, 5, 99-120. · Zbl 1404.62034
[48] Nagler, T., & Vatter, T. (2019a). kde1d: Univariate Kernel Density Estimation. R package version 1.0.2.
[49] Nagler, T., & Vatter, T. (2019b). rvinecopulib: High Performance Algorithms for Vine Copula Modeling. R package version 0.5.1.1.0.
[50] Noh, H., Ghouch, A. E., & Bouezmarni, T. (2013). Copula-based regression estimation and inference. Journal of the American Statistical Association, 108(502), 676-688. · Zbl 06195970
[51] Noh, H., Ghouch, A. E., & Van Keilegom, I. (2015). Semiparametric conditional quantile estimation through copula-based multivariate models. Journal of Business & Economic Statistics, 33(2), 167-178.
[52] Parzen, E. (1962). On estimation of a probability density function and mode. The Annals of Mathematical Statistics, 33(3), 1065-1076. · Zbl 0116.11302
[53] R CoreTeam. (2020). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria.
[54] Ryzin, J. V. (1969). On strong consistency of density estimates. The Annals of Mathematical Statistics, 40(5), 1765-1772. · Zbl 0198.23502
[55] Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics, 6(2), 461-464. · Zbl 0379.62005
[56] Silverman, B. W. (1978). Weak and strong uniform consistency of the kernel estimate of a density and its derivatives. The Annals of Statistics, 6, 177-184. · Zbl 0376.62024
[57] Sklar, M. (1959). Fonctions de repartition an dimensions et leurs marges (Vol. 8, pp. 229-231). Paris: Publ. Inst. Statist. Univ. · Zbl 0100.14202
[58] Stoeber, J., Joe, H., & Czado, C. (2013). Simplified pair copula constructions—limitations and extensions. Journal of Multivariate Analysis, 119, 101-118. · Zbl 1277.62139
[59] Tepegjozova, M. (2019). D- and C-vine quantile regression for large data sets. Masterarbeit, Technische Universität München, Garching b. München.
[60] Van Keilegom, I., & Veraverbeke, N. (1998). Bootstrapping quantiles in a fixed design regression model with censored data. Journal of Statistical Planning and Inference, 69(1), 115-131. · Zbl 0953.62040
[61] Wen, K., & Wu, X. (2015). An improved transformation-based kernel estimator of densities on the unit interval. Journal of the American Statistical Association, 110(510), 773-783. · Zbl 1373.62149
[62] Wied, D., & Weißbach, R. (2012). Consistency of the kernel density estimator: A survey. Statistical Papers, 53(1), 1-21. · Zbl 1241.62049
[63] Xiao, Z., & Koenker, R. (2009). Conditional quantile estimation for generalized autoregressive conditional heteroscedasticity models. Journal of the American Statistical Association, 104(488), 1696-1712. · Zbl 1205.62136
[64] Yeh, I.-C. (1998). Modeling of strength of high-performance concrete using artificial neural networks. Cement and Concrete Research, 28(12), 1797-1808.
[65] Yu, K., & Jones, M. (1998). Local linear quantile regression. Journal of the American statistical Association, 93(441), 228-237. · Zbl 0906.62038
[66] Yu, K., & Moyeed, R. A. (2001). Bayesian quantile regression. Statistics & Probability Letters, 54(4), 437-447. · Zbl 0983.62017
[67] Zhu, K., Kurowicka, D., & Nane, G. F. (2021). Simplified R-vine-based forward regression. Computational Statistics & Data Analysis, 155, 107091. · Zbl 1510.62232
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.