×

Feature selection algorithms in generalized additive models under concurvity. (English) Zbl 07876386

Summary: In this paper, the properties of 10 different feature selection algorithms for generalized additive models (GAMs) are compared on one simulated and two real-world datasets under concurvity. Concurvity can be interpreted as a redundancy in the feature set of a GAM. Like multicollinearity in linear models, concurvity causes unstable parameter estimates in GAMs and makes the marginal effect of features harder interpret. Feature selection algorithms for GAMs can be separated into four clusters: stepwise, boosting, regularization and concurvity controlled methods. Our numerical results show that algorithms with no constraints on concurvity tend to select a large feature set, without significant improvements in predictive performance compared to a more parsimonious feature set. A large feature set is accompanied by harmful concurvity in the proposed models. To tackle the concurvity phenomenon, recent feature selection algorithms such as the mRMR and the HSIC-Lasso incorporated some constraints on concurvity in their objective function. However, these algorithms interpret concurvity as pairwise non-linear relationship between features, so they do not account for the case when a feature can be accurately estimated as a multivariate function of several other features. This is confirmed by our numerical results. Our own solution to the problem, a hybrid genetic-harmony search algorithm (HA) introduces constrains on multivariate concurvity directly. Due to this constraint, the HA proposes a small and not redundant feature set with predictive performance similar to that of models with far more features.

MSC:

62-08 Computational methods for problems pertaining to statistics

References:

[1] Altman, N.; Krzywinski, M., Analyzing outliers: Influential or nuisance?, Nat Methods, 13, 4, 281-283, 2016 · doi:10.1038/nmeth.3812
[2] Amodio, S.; Aria, M.; D’Ambrosio, A., On concurvity in nonlinear and nonparametric regression models, Statistica, 74, 1, 85-98, 2014
[3] Augustin, NH; Sauleau, EA; Wood, SN, On quantile quantile plots for generalized linear models, Comput Stat Data Anal, 56, 8, 2404-2409, 2012 · Zbl 1252.62072 · doi:10.1016/j.csda.2012.01.026
[4] Belitz, C.; Lang, S., Simultaneous selection of variables and smoothing parameters in structured additive regression models, Comput Stat Data Anal, 53, 1, 61-81, 2008 · Zbl 1452.62029 · doi:10.1016/j.csda.2008.05.032
[5] Binder, H.; Tutz, G., A comparison of methods for the fitting of generalized additive models, Stat Comput, 18, 1, 87-99, 2008 · doi:10.1007/s11222-007-9040-0
[6] Breheny, P.; Huang, J., Coordinate descent algorithms for nonconvex penalized regression, with applications to biological feature selection, Ann Appl Stat, 5, 1, 232-253, 2011 · Zbl 1220.62095 · doi:10.1214/10-AOAS388
[7] Cantoni, E.; Flemming, JM; Ronchetti, E., Variable selection in additive models by non-negative garrote, Stat Model, 11, 3, 237-252, 2011 · doi:10.1177/1471082X1001100304
[8] Chong, IG; Jun, CH, Performance of some variable selection methods when multicollinearity is present, Chemom Intell Lab Syst, 78, 1-2, 103-112, 2005 · doi:10.1016/j.chemolab.2004.12.011
[9] Climente-González, H.; Azencott, CA; Kaski, S.; Yamada, M., Block HSIC Lasso: model-free biomarker detection for ultra-high dimensional data, Bioinformatics, 35, 14, i427-i435, 2019 · doi:10.1093/bioinformatics/btz333
[10] De Jay, N.; Papillon-Cavanagh, S.; Olsen, C.; El-Hachem, N.; Bontempi, G.; Haibe-Kains, B., mRMRe: an R package for parallelized mRMR ensemble feature selection, Bioinformatics, 29, 18, 2365-2368, 2013 · doi:10.1093/bioinformatics/btt383
[11] Du, M.; Liu, N.; Hu, X., Techniques for interpretable machine learning, Commun ACM, 63, 1, 68-77, 2019 · doi:10.1145/3359786
[12] Efroymson MA (1960) Multiple regression analysis. In: Ralston A, Wilf HS (eds) Mathematical methods for digital computers. John Wiley, New York, pp 191-203 · Zbl 0089.12602
[13] Gretton A, Bousquet O, Smola A, Schölkopf B (2005) Measuring statistical dependence with Hilbert-Schmidt norms. In: International conference on algorithmic learning theory. Springer, Berlin, pp 63-77 · Zbl 1168.62354
[14] Gu, H.; Kenney, T.; Zhu, M., Partial generalized additive models: an information-theoretic approach for dealing with concurvity and selecting variables, J Comput Graph Stat, 19, 3, 531-551, 2010 · doi:10.1198/jcgs.2010.07139
[15] Hall MA (1999) Correlation-based feature selection for machine learning. Dissertation, University of Waikato.
[16] Hartigan, JA; Wong, MA, Algorithm AS 136: A k-means clustering algorithm, J R Stat Soc Ser C Appl Stat, 28, 1, 100-108, 1979 · Zbl 0447.62062 · doi:10.2307/2346830
[17] Hastie, TJ; Tibshirani, RJ, Generalized additive models, 1990, London: Chapman and Hall, London · Zbl 0747.62061
[18] Hastie TJ (2018) gam: generalized additive models. R package version 1.16. https://CRAN.R-project.org/package=gam
[19] Huo X, Ni X (2007) When do stepwise algorithms meet subset selection criteria?. Ann Stat. pp 870-887. https://www.jstor.org/stable/25463581 · Zbl 1125.62079
[20] James, G.; Witten, D.; Hastie, TJ; Tibshirani, R., An introduction to statistical learning: with applications in R, 2013, New York: Springer, New York · Zbl 1281.62147 · doi:10.1007/978-1-4614-7138-7
[21] Jia, J.; Yu, B., On model selection consistency of the elastic net, Stat Sin, 20, 595-611, 2010 · Zbl 1187.62125
[22] Kuhn M, Wing J, Weston S, Williams A, Keefer C, Engelhardt A, Cooper T, Mayer Z, Kenkel B, the R Core Team, Benesty M, Lescarbeau R, Ziem A, Scrucca L, Tang Y, Candan C, Tyler H (2019) caret: Classification and Regression Training. R package version 6.0-84. https://CRAN.R-project.org/package=caret
[23] Lai, J.; Lortie, CJ; Muenchen, RA; Yang, J.; Ma, K., Evaluating the popularity of R in ecology, Ecosphere, 10, 1, e02567, 2019 · doi:10.1002/ecs2.2567
[24] Láng, B.; Kovács, L.; Mohácsi, L., Linear regression model selection using a hybrid genetic - Improved harmony search parallelized algorithm, SEFBIS J, 11, 1, 2-9, 2017
[25] Lin, Y.; Zhang, HH, Component selection and smoothing in multivariate nonparametric regression, Ann Stat, 34, 5, 2272-2297, 2006 · Zbl 1106.62041 · doi:10.1214/009053606000000722
[26] Mansfield, ER; Helms, BP, Detecting multicollinearity, Am Stat, 36, 3, 158-160, 1982 · doi:10.1080/00031305.1982.10482818
[27] Marra, G.; Wood, SN, Practical variable selection for generalized additive models, Comput Stat Data Anal, 55, 7, 2372-2387, 2011 · Zbl 1328.62475 · doi:10.1016/j.csda.2011.02.004
[28] McFadden, D.; Zarembka, P., Conditional logit analysis of qualitative choice behaviour, Frontiers in econometrics, 105-142, 1974, New York: Academic Press, New York
[29] Molnar, C., Interpretable machine learning, 2020, Victoria: Leanpub, Victoria
[30] Perperoglou, A.; Sauerbrei, W.; Abrahamowicz, M.; Schmid, M., A review of spline function procedures in R, BMC Med Res Methodol, 19, 1, 1-16, 2019 · doi:10.1186/s12874-019-0666-3
[31] Ramsay, TO; Burnett, RT; Krewski, D., The effect of concurvity in generalized additive models linking mortality to ambient particulate matter, Epidemiology, 14, 1, 18-23, 2003 · doi:10.1097/00001648-200301000-00009
[32] Schapire, RE, The strength of weak learnability, Mach Learn, 5, 197-227, 1990 · doi:10.1007/BF00116037
[33] Schmid, M.; Hothorn, T., Boosting additive models using component-wise P-splines, Comput Stat Data Anal, 53, 2, 298-311, 2008 · Zbl 1231.62071 · doi:10.1016/j.csda.2008.09.009
[34] Signoretto M, Pelckmans K, Suykens JA (2008) Functional ANOVA Models: Convex-concave approach and concurvity analysis (No. 08-203). Internal Report.
[35] Therneau T, Atkinson B (2018) rpart: recursive partitioning and regression trees. R package version 4.1-13. https://CRAN.R-project.org/package=rpart
[36] Tibshirani, R., Regression shrinkage and selection via the lasso, J R Stat Soc: Ser B (methodol), 58, 1, 267-288, 1996 · Zbl 0850.62538 · doi:10.1111/j.2517-6161.1996.tb02080.x
[37] Tutz, G.; Binder, H., Generalized additive modeling with implicit variable selection by likelihood-based boosting, Biometrics, 62, 4, 961-971, 2006 · Zbl 1116.62075 · doi:10.1111/j.1541-0420.2006.00578.x
[38] Weston S (2019a) foreach: provides foreach looping construct. R package version 1.4.7. https://CRAN.R-project.org/package=foreach
[39] Weston S (2019b) doParallel: Foreach Parallel Adaptor for the ’parallel’ Package. R package version 1.0.15. https://CRAN.R-project.org/package=doParallel
[40] Wood, SN, Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models, J R Stat Soc Ser B Stat Methodol, 73, 1, 3-36, 2011 · Zbl 1411.62089 · doi:10.1111/j.1467-9868.2010.00749.x
[41] Wood, SN, Generalized additive models: an introduction with R, 2017, London: Chapman and Hall/CRC, London · Zbl 1368.62004 · doi:10.1201/9781315370279
[42] Wooldridge, JM, Introductory econometrics: a modern approach, 2016, Toronto: Nelson Education, Toronto
[43] Yang, S.; Zhang, H., Comparison of several data mining methods in credit card default prediction, Intell Inf Manag, 10, 5, 115-122, 2018 · doi:10.4236/iim.2018.105010
[44] Yeh, IC, Modeling of strength of high-performance concrete using artificial neural networks, Cem Concr Res, 28, 12, 1797-1808, 1998 · doi:10.1016/S0008-8846(98)00165-3
[45] Yeh, IC; Lien, CH, The comparisons of data mining techniques for the predictive accuracy of probability of default of credit card clients, Expert Syst Appl, 36, 2, 2473-2480, 2009 · doi:10.1016/j.eswa.2007.12.020
[46] Zhang HH, Lin CY (2013) cosso: fit regularized nonparametric regression models using COSSO penalty. R package version 2.1-1. https://CRAN.R-project.org/package=cosso
[47] Zhao, P.; Yu, B., On model selection consistency of Lasso, J Mach Learn Res, 7, 2541-2563, 2006 · Zbl 1222.62008
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.