×

Model confidence bounds for variable selection. (English) Zbl 1436.62593

Summary: In this article, we introduce the concept of model confidence bounds (MCB) for variable selection in the context of nested models. Similarly to the endpoints in the familiar confidence interval for parameter estimation, the MCB identifies two nested models (upper and lower confidence bound models) containing the true model at a given level of confidence. Instead of trusting a single selected model obtained from a given model selection method, the MCB proposes a group of nested models as candidates and the MCB’s width and composition enable the practitioner to assess the overall model selection uncertainty. A new graphical tool – the model uncertainty curve (MUC) – is introduced to visualize the variability of model selection and to compare different model selection procedures. The MCB methodology is implemented by a fast bootstrap algorithm that is shown to yield the correct asymptotic coverage under rather general conditions. Our Monte Carlo simulations and real data examples confirm the validity and illustrate the advantages of the proposed method.

MSC:

62P10 Applications of statistics to biology and medical sciences; meta analysis
62G15 Nonparametric tolerance and confidence regions

References:

[1] Belloni, A., Chernozhukov, V., and Wang, L. (2011). Square‐root lasso: Pivotal recovery of sparse signals via conic programming. Biometrika98, 791-806. · Zbl 1228.62083
[2] Camponovo, L. (2015). On the validity of the pairs bootstrap for lasso estimators. Biometrika102, 981-987. · Zbl 1372.62021
[3] Chatterjee, A. and Lahiri, S. N. (2010). Asymptotic properties of the residual bootstrap for lasso estimators. Proc Am Math Soc138, 4497-4509. · Zbl 1203.62014
[4] Chatterjee, A. and Lahiri, S. N. (2011). Bootstrapping lasso estimators. J Am Stat Assoc106, 608-625. · Zbl 1232.62088
[5] Chatterjee, A. and Lahiri, S. N. (2013). Rates of convergence of the adaptive lasso estimators to the oracle distribution and higher order refinements by the bootstrap. Ann Stat41, 1232-1259. · Zbl 1293.62153
[6] Efron, B., Hastie, T., Johnstone, I., and Tibshirani, R. (2004). Least angle regression. Ann Stat32, 407-499. · Zbl 1091.62054
[7] Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc96, 1348-1360. · Zbl 1073.62547
[8] Fan, J. and Lv, J. (2010). A selective overview of variable selection in high dimensional feature space. Stat Sin20, 101-148. · Zbl 1180.62080
[9] Ferrari, D. and Yang, Y. (2015). Confidence sets for model selection by F‐testing. Stat Sin25, 1637-1658. · Zbl 1377.62155
[10] Freedman, D. A. (1981). Bootstrapping regression models. Ann Stat9, 1218-1228. · Zbl 0449.62046
[11] Hansen, P. R., Lunde, A., and Nason, J. M. (2003). Choosing the best volatility models: The model confidence set approach. Oxf Bulltin Econ Stat65, 839-861.
[12] Hansen, P. R., Lunde, A., and Nason, J. M. (2005). Model confidence sets for forecasting models. Technical report, Working Paper, Federal Reserve Bank of Atlanta.
[13] Hansen, P. R., Lunde, A., and Nason, J. M. (2011). The model confidence set. Econometrica79, 453-497. · Zbl 1210.62030
[14] Leeb, H. and Pötscher, B. M. (2006). Can one estimate the conditional distribution of post‐model‐selection estimators? Ann Stat34, 2554-2591. · Zbl 1106.62029
[15] Lindsey, C. and Sheather, S. (2010). Variable selection in linear regression. Stata J10, 650.
[16] Liu, H. and Yu, B. (2013). Asymptotic properties of Lasso+mLS and Lasso+Ridge in sparse high‐dimensional linear regression. Electron J Stat7, 3124-3169. · Zbl 1281.62158
[17] Meier, L., Van De Geer, S., and Bühlmann, P. (2008). The group lasso for logistic regression. J R Stat Soc Ser B70, 53-71. · Zbl 1400.62276
[18] Pötscher, B. M. and Schneider, U. (2010). Confidence sets based on penalized maximum likelihood estimators in Gaussian regression. Electron J Stat4, 334-360. · Zbl 1329.62156
[19] Samuels, J. D. and Sekkel, R. M. (2013). Forecasting with many models: Model confidence sets and forecast combination. Technical report, Bank of Canada Working Paper.
[20] Shimodaira, H. (1998). An application of multiple comparison techniques to model selection. Ann Inst Stat Math50, 1-13. · Zbl 0898.62091
[21] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J R Stat Soc Ser B58, 267-288. · Zbl 0850.62538
[22] Wang, H., Li, G., and Jiang, G. (2007). Robust regression shrinkage and consistent variable selection through the LAD‐Lasso. J Bus Econ Stat25, 347-355.
[23] Zhang, C.‐H. (2010). Nearly unbiased variable selection under minimax concave penalty. Ann Stat38, 894-942. · Zbl 1183.62120
[24] Zhao, P. and Yu, B. (2006). On model selection consistency of lasso. J Mach Learn Res7, 2541-2563. · Zbl 1222.62008
[25] Zhou, N., Cheng, W., Qin, Y., and Yin, Z. (2015). Evolution of high frequency systematic trading: A performance‐driven gradient boosting model. Quant Financ15, 1387-1403. · Zbl 1406.91515
[26] Zou, H. (2006). The adaptive lasso and its oracle properties. J Am Stat Assoc101, 1418-1429. · Zbl 1171.62326
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.