×

Variable selection in heteroscedastic regression models under general skew-\(t\) distributional models using information complexity. (English) Zbl 07616799

Bekker, Andriëtte (ed.) et al., Computational and methodological statistics and biostatistics. Contemporary essays in advancement. Cham: Springer. Emerg. Top. Stat. Biostat., 73-98 (2020).
Summary: In this paper we study several competing models under general class of skew-\(t\) distributions. Namely. we consider joint location and scale model (JLSM) under Student’s t and under skew-\(t\) distributions, respectively. Similarly, we consider the extension of JLSM to joint location-scale and skewness model (JLSSM) under skew-\(t\) distribution in heteroscedastic regression models for subset selection of variables and to deal with heavy-tailedness, and skewness in a data set. To this end, for the first time, we introduce and develop the information-theoretic measure of complexity (ICOMP) criterion in such problems to select the best subset of predictor variables. We provide the computational forms of the celebrated Fisher information and the inverse Fisher information matrices for these models to be used in ICOMP. A large-scale Monte Carlo simulation study is carried out to study the performance of ICOMP in such complicated models. In addition, a real example is provided on a real benchmark data set to select the best subset of the predictors under these three competing models without knowing the true structure and the distributional form of the regression model. Our approach shows the flexibility and versatility of our approach for model selection in complex models.
For the entire collection see [Zbl 1486.62001].

MSC:

62P10 Applications of statistics to biology and medical sciences; meta analysis

Software:

sn; GLIM
Full Text: DOI

References:

[1] Aitkin, M. (1987). Modelling variance heterogeneity in normal regression using GLIM. Journal of the Royal Statistical Society. Series C (Applied Statistics), 36, 332-339.
[2] Akaike, H. (1973). Information theory and extension of the maximum likelihood principle. In B. N. Petrov & F. Csáki (Eds.), Second International Symposium on Information Theory (pp. 267-281). Budapest: Académiai Kiadó. · Zbl 0283.62006
[3] Arslan, O., Genç, A. I. (2003). Robust location and scale estimation based on the univariate generalized t (GT) distribution. Communications in Statistics: Theory and Methods, 32(8), 1505-1525. · Zbl 1184.62024
[4] Azzalini, A. (1985). A class of distributions which includes the normal ones. Scandinavian Journal of Statistics, 12, 171-178. · Zbl 0581.62014
[5] Azzalini, A. (2005). The skew-normal distribution and related multivariate families. Scandinavian Journal of Statistics, 32(2), 159-188. · Zbl 1091.62046
[6] Azzalini, A., & Capitanio, A. (2003). Distributions generated by perturbation of symmetry with emphasis on a multivariate skew t distribution. Journal of Royal Statistical Society, Series B, 65, 367-389. · Zbl 1065.62094
[7] Azzalini, A., & Dalla-Valle, A. (1996). The multivariate skew-normal distribution. Biometrika, 83, 715-726. · Zbl 0885.62062
[8] Bozdogan, H. (1988). ICOMP: A new model selection criterion. In H. H. Bock (Ed.), Classification and related methods of data analysis (pp. 599-608). Amsterdam: North-Holland. · Zbl 0729.62551
[9] Bozdogan, H. (1990). On the information-based measure of covariance complexity and its application to the evaluation of multivariate linear models. Communications in Statistics, Theory and Methods, 19, 221-278. · Zbl 0900.62041
[10] Bozdogan, H. (1994). Mixture-model cluster analysis using model selection criteria and a new informational measure of complexity. In H. Bozdogan (Ed.), Multivariate statistical modeling. Proceedings of the First US/Japan Conference on the Frontiers of Statistical Modeling: An Informational Approach (Vol. 2, pp. 69-113). Dordrecht: Kluwer Academic Publishers.
[11] Bozdogan, H. (1997, October). Empirical econometric modeling of food consumption using a new informational complexity approach. Invited world title award winning paper in the Journal of Applied Econometrics, 12, 563-592 (with Peter M. Bearse, and Alan M. Schlottmann). This paper is also published in the book: Magnus, J. R., & Morgan, M. S. (1997). Methodology & tacit knowledge: Two experiments in econometrics (pp. 153-182). Chichester: Wiley.
[12] Bozdogan, H. (2000). Akaike’s information criterion and recent developments in information complexity. Journal of Mathematical Psychology, 44(1), 62-91. · Zbl 1047.62501
[13] Bozdogan, H. (2003). Intelligent statistical data mining with information complexity and genetic algorithms. In Statistical data mining and knowledge discovery. Joint International Summer School JISS-2003 (Vol. II). Universidade de Lisboa, Lisbon.
[14] Bozdogan, H. (2004). Intelligent statistical data mining with information complexity and genetic algorithms. In H. Bozdogan (Ed.), Statistical data mining and knowledge discovery (pp. 47-88). Boca Raton: Chapman and Hall/CRC.
[15] Bozdogan, H., & Haughton, D. (1998). Informational complexity criteria for regression models. Computational Statistics and Data Analysis, 28, 51-76. · Zbl 1042.62504
[16] Cook, R. D., & Weisberg, S. (1994). An introduction to regression graphics. New York: Wiley. · Zbl 0925.62287
[17] Harvey, A. C. (1976). Estimating regression models with multiplicative heteroscedasticity. Econo-metrica, 44, 460-465. · Zbl 0333.62040
[18] Lange, K. L., Little, R. J. A., & Taylor, J. M. G. (1989). Robust statistical modeling using the t distribution. Journal of the American Statistical Association, 84, 881-896.
[19] Li, H., Wu, L., & Ma, T. (2017). Variable selection in joint location, scale and skewness models of the skew normal distribution. Journal of Systems Science and Complexity, 30(3), 694-709. · Zbl 1369.62166
[20] Li, H. Q., & Wu, L. C. (2014). Joint modelling of location and scale parameters of the skew-normal distribution. Applied Mathematics-A Journal of Chinese Universities, 29(3), 265-272. · Zbl 1324.62011
[21] Lin, T. I., & Wang, Y. J. (2009). A robust approach to joint modeling of mean and scale covariance for longitudinal data. Journal of Statistical Planning and Inference, 139(9), 3013-3026. · Zbl 1168.62082
[22] Lin, T. I., & Wang, W. L. (2011). Bayesian inference in joint modelling of location and scale parameters of the t distribution for longitudinal data. Journal of Statistical Planning and Inference, 141(4), 1543-1553. · Zbl 1204.62040
[23] Park, R. E. (1966). Estimation with heteroscedastic error terms. Econometrica, 34, 888.
[24] Taylor, J. T., & Verbyla, A. P. (2004). Joint modelling of location and scale parameters of the t distribution. Statistical Modelling, 4, 91-112. · Zbl 1112.62010
[25] Van Emden, M. H. (1971). An analysis of complexity. Mathematical Centre Tracts, 35. Mathema-tisch Centrum: Amsterdam. · Zbl 0225.68015
[26] Wang, D. R., & Zhang, Z. Z. (2009). Variable selection in joint generalized linear models. Chinese Journal of Applied Probability and Statistics, 25, 245-256. · Zbl 1211.62121
[27] Wu, L. C. (2014). Variable selection in joint location and scale models of the skew-t-normal distribution. Communications in Statistics -Simulation and Computation, 43(3), 615-630. · Zbl 1291.62062
[28] Wu, L. C., & Li, H. Q. (2012). Variable selection for joint mean and dispersion models of the inverse Gaussian distribution. Metrika, 75, 795-808. · Zbl 1410.62132
[29] Wu, L. C., Ma, T., & Zhan, J. (2013). Maximum likelihood estimation for joint location, scale and skewness models of the StN distribution. Applied Mathematics A Journal of Chinese Universities(Ser.A), 4, 6. · Zbl 1299.62015
[30] Wu, L., Tian, G. L., Zhang, Y. Q., & Ma, T. (2017). Variable selection in joint location, scale and skewness models with a skew-t-normal distribution. Statistics and Its Interface, 10(2), 217-227. · Zbl 1388.62052
[31] Wu, L. C., Zhang, Z. Z., Tian, G. L., & Xu, D. K. (2016). A robust variable selection to t-type joint generalized linear models via penalized t-type pseudo-likelihood. Communications in Statistics -Simulation and Computation, 45(7), 2320-2337 · Zbl 1346.62049
[32] Wu, L. C., Zhang, Z. Z., & Xu, D. K. (2012). Variable selection in joint mean and variance models of Box-Cox transformation. Journal of Applied Statistics, 39, 2543-2555. · Zbl 1514.62133
[33] Wu, L. C., Zhang, Z. Z., & Xu, D. K. (2013). Variable selection in joint location and scale models of the skew-normal distribution. Journal of Statistical Computation and Simulation, 83, 1266-1278. · Zbl 1431.62293
[34] Zhang, Z. Z., & Wang, D. R. (2011). Simultaneous variable selection for heteroscedastic regression models. Science China Mathematics, 54, 515-530. · Zbl 1216.62104
[35] Zhao, W., & Zhang, R. (2015). Variable selection of varying dispersion student-t regression models. Journal of Systems Science and Complexity, 28(4), 961-977. · Zbl 1320.93080
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.