×

A procedure for variable selection in double generalized linear models. (English) Zbl 07572882

Summary: The double generalized linear models (DGLM) allow the fit of the dispersion parameter of the response variable as a function of explanatory variables. Thus they are a possible solution when the assumption of constant dispersion parameter is unreasonable and the response variable follows a distribution from the exponential family. As in other classes of regression models, variable selection is an important step in the fit of a DGLM. In this work, we propose the \(k\)-steps variable selection scheme in double generalized linear models, where \(k\) is the number of steps required to achieve convergence. To check the performance of our procedure, we performed Monte Carlo simulation studies. The results indicate that our procedure for variable selection presents, in general, similar or superior performance than the other studied methods without requiring a large computational cost. We also evaluated the \(k\)-steps variable selection scheme using real data. The results suggest that our procedure can also be a good alternative when prediction is the main goal of the model.

MSC:

62-XX Statistics
Full Text: DOI

References:

[1] Smyth, GK., Generalized linear models with varying dispersion, J R Stat Soc, 51, 1, 47-60 (1989)
[2] Akaike, H., A new look at the statistical model identification, IEEE Trans Automat Contr, 19, 6, 716-723 (1974) · Zbl 0314.62039
[3] Akaike, H., A Bayesian analysis of the minimum AIC procedure, Ann Inst Stat Math, 30, 9-14 (1978) · Zbl 0441.62007
[4] Schwarz, G., Estimating the dimension of a model, Ann Stat, 6, 2, 461-464 (1978) · Zbl 0379.62005
[5] Hannan, EJ; Quinn, BG., The determination of the order of an autoregression, J R Stat Soc, 41, 2, 190-195 (1979) · Zbl 0408.62076
[6] Hurvich, CM; Tsai, C-L., Regression and time series model selection in small samples, Biometrika, 76, 2, 297-307 (1989) · Zbl 0669.62085
[7] Draper, N.; Smith, H., Applied regression analysis (1967), New York: John Wiley and Sons, New York · Zbl 0158.17101
[8] Claeskens, G.; Hjort, NL., Model selection and model averaging, 330 (2008), Cambridge: Cambridge University Press, Cambridge · Zbl 1166.62001
[9] Cottet, R.; Kohn, RJ; Nott, DJ., Variable selection and model averaging in semiparametric overdispersed generalized linear models, J Am Stat Assoc, 103, 482, 661-671 (2008) · Zbl 1469.62311
[10] Xu, D.; Zhang, Z.; Wu, L., Variable selection in high-dimensional double generalized linear models, Stat Pap, 55, 2, 327-347 (2014) · Zbl 1297.62127
[11] Bayer, F.; Cribari-Neto, F., Bootstrap-based model selection criteria for beta regressions, TEST, 24, 4, 776-795 (2015) · Zbl 1329.62347
[12] Bayer, FM; Cribari-Neto, F., Model selection criteria in beta regression with varying dispersion, Commun Stat Simul Comput, 46, 1, 729-746 (2017) · Zbl 1364.62191
[13] Antoniadis, A.; Gijbels, I.; Lambert-Lacroix, S., Joint estimation and variable selection for mean and dispersion in proper dispersion models, Electron J Stat, 10, 1, 1630-1676 (2016) · Zbl 1404.62074
[14] R Development Core Team, R: A language and environment for statistical computing (2008), Vienna: R Foundation for Statistical Computing, Vienna
[15] Stasinopoulos, MD; Rigby, RA; Heller, GZ, Flexible regression and smoothing: using GAMLSS in R (2017), Boca Raton: Chapman and Hall/CRC Press, Boca Raton
[16] Ramires, GT; Nakamura, RL; Righetto, JA, Validation of stepwise-based procedure in GAMLSS, J Data Sci, 19, 1, 96-110 (2021)
[17] Stasinopoulos, DM; Rigby, RA., Generalized additive models for location scale and shape (GAMLSS) in R, J Stat Softw, 23, 7, 1-46 (2007)
[18] Demirtas, H., Pseudo-random number generation in R for commonly used multivariate distributions, J Mod Appl Stat Methods, 3, 2, 485-497 (2004)
[19] Buckland, ST; Burnham, KP; Augustin, NH., Model selection: an integral part of inference, Biometrics, 53, 2, 603-618 (1997) · Zbl 0885.62118
[20] Winner. PGA performance statistics and winnings; 2020 [cited 2020 Jun 3]. Available from: http://users.stat.ufl.edu/winner/data/pga2004.dat
[21] Filliben, JJ., The probability plot correlation coefficient test for normality, Technometrics, 17, 1, 111-117 (1975) · Zbl 0295.62049
[22] Dunn, PK; Smyth, GK., Randomized quantile residuals, J Comput Graph Stat, 5, 3, 236-244 (1996)
[23] Breiman, L., Random forests, Mach Learn, 45, 1, 5-32 (2001) · Zbl 1007.68152
[24] Hoerl, AE; Kennard, RW., Ridge regression: biased estimation for nonorthogonal problems, Technometrics, 12, 1, 55-67 (1970) · Zbl 0202.17205
[25] Smithson, M.; Verkuilen, J., A better lemon squeezer? maximum-likelihood regression with beta-distributed dependent variables, Psychol Methods, 11, 1, 54-71 (2006)
[26] Santos-Neto, M.; Cysneiros, FJA; Leiva, V., Reparameterized Birnbaum-Saunders regression models with varying precision, Electron J Stat, 10, 2, 2825-2855 (2016) · Zbl 1348.62220
[27] Rigby, RA; Stasinopoulos, DM., Generalized additive models for location, scale and shape, J R Stat Soc, 54, 3, 507-554 (2005) · Zbl 1490.62201
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.