×

Regression model selection via log-likelihood ratio and constrained minimum criterion. (English. French summary) Zbl 07836163

Summary: Although log-likelihood is widely used in model selection, the log-likelihood ratio has had few applications in this area. We develop a log-likelihood ratio based method for selecting regression models by focusing on the set of models deemed plausible by the likelihood ratio test. We show that when the sample size is large and the significance level of the test is small, there is a high probability that the smallest model in this set is the true model; thus, we select this smallest model. The significance level of the test serves as a tuning parameter of this method. We consider three levels of this parameter in a simulation study and compare this method with the Akaike information criterion (AIC) and Bayesian information criterion (BIC) to demonstrate its excellent accuracy and adaptability to different sample sizes. This method is a frequentist alternative and a strong competitor to AIC and BIC for selecting regression models.
© 2023 Statistical Society of Canada / Société statistique du Canada.

MSC:

62-XX Statistics

References:

[1] Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control, 19, 716-723. · Zbl 0314.62039
[2] Bertsimas, D., King, A., & Mazumder, R. (2016). Best subset selection via a modern optimization lens. Annal of Statistics, 44, 813-852. · Zbl 1335.62115
[3] Broersen, P. M. T. (2000). Finite sample criteria for autoregressive order selection. IEEE Transactions on Signal Processing, 48, 3550-3558.
[4] Chen, J. & Chen, Z. (2008). Extended Bayesian information criteria for model selection with large model spaces. Biometrika, 95, 759-771. · Zbl 1437.62415
[5] Ding, J., Tarokh, V., & Yang, Y. (2018a). Model selection techniques: An overview. IEEE Signal Processing Magazine, 35, 16-34.
[6] Ding, J., Tarokh, V., & Yang, Y. (2018b). Bridging AIC and BIC: A new criterion for autoregression. IEEE Transactions on Information Theory, 64, 4024-4043. · Zbl 1395.94209
[7] Fahrmeir, L. & Kaufmann, H. (1985). Consistency and asymptotic normality of the maximum likelihood estimator in generalized linear models. Annals of Statistics, 13, 342-368. · Zbl 0594.62058
[8] Fan, J. & Lv, J. (2008). Sure independence screening for ultrahigh dimensional feature space. Journal of Royal Statistical Society, Series B, 70, 849-911. · Zbl 1411.62187
[9] Furnival, G. M. & Wilson, R. W., Jr. (1974). Regressions by leaps and bounds. Technometrics, 16, 499-511. · Zbl 0294.62079
[10] Gouriéroux, C. & Monfort, A. (1981). Asymptotic properties of the maximum likelihood estimator in dichotomous logit models. Journal of Econometrics, 17, 83-97. · Zbl 0481.62029
[11] Güney, Y., Bozdogan, H., & Arslan, O. (2021). Robust model selection in linear regression models using information complexity. Journal of Computational and Applied Mathematics, 398, 113679. · Zbl 1469.62214
[12] Haberman, S. J. (1977). Maximum likelihood estimates in exponential response models. Annals of Statistics, 5, 815-841. · Zbl 0368.62019
[13] Hannan, E. J. & Quinn, B. G. (1979). The determination of the order of an autoregression. Journal of Royal Statistical Society, Series B, 41, 190-195. · Zbl 0408.62076
[14] Hastie, T., Tibshirani, R., & Friedman, J. (2009). Elements of Statistical Learning: Data Mining, Inference and Predictions, 2nd ed., Springer Verlag, New York. · Zbl 1273.62005
[15] Hastie, T., Tibshirani, R., & Wainwright, M. (2015). Statistical Learning with Sparsity, 2nd ed., CRC Press, Boca Raton, FL. · Zbl 1319.68003
[16] Hurvich, C. M. & Tsai, C. L. (1989). Regression and time series model selection in small samples. Biometrika, 76, 297-307. · Zbl 0669.62085
[17] Kadane, J. B. & Lazar, N. A. (2004). Methods and criteria for model selection. Journal of the American Statistical Association, 99, 279-290. · Zbl 1089.62501
[18] McLeod, A. I. & Xu, C. (2020). bestglm: Best Subset GLM. Vignette for R package ‘bestglm’. Available at: http://www2.uaem.mx/r‐mirror
[19] McLeod, A. I., Xu, C. & Lai, Y. (2020). Package ‘bestglm’. An R Package. Available at: https://cran.r‐project.org
[20] Morgan, J. A. & Tatar, J. F. (1972). Calculation of the residual sum of squares for all possible regressions. Technometrics, 14, 317-325. · Zbl 0248.62007
[21] Müller, S. & Welsh, A. H. (2005). Outlier robust model selection in linear regression. Journal of the American Statistical Association, 100, 1297-1310. · Zbl 1117.62405
[22] Rao, C. R. & Wu, Y. H. (1989). A strongly consistent procedure for model selection in a regression problem. Biometrika, 76(2), 369-374. · Zbl 0669.62051
[23] Rousseauw, J., duPlessis, J., Benade, A., Jordaan, P., Kotze, J., Jooste, P., & Ferreira, J. (1983). Coronary risk factor screening in three rural communities. South African Medical Journal, 64, 430-436.
[24] Schwarz, G. E. (1978). Estimating the dimension of a model. Annals of Statistics, 6, 461-464. · Zbl 0379.62005
[25] Sclove, S. L. (1987). Application of model‐selection criteria to some problems in multivariate analysis. Psychometrika, 52, 333-343.
[26] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, Series B, 58, 267-288. · Zbl 0850.62538
[27] Tsao, M. (2021). A constrained minimum method for model selection. Stat, 10(1), e387. · Zbl 07851327
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.