Document Zbl 07624290

Jenul, Anna; Schrunner, Stefan; Pilz, Jürgen; Tomic, Oliver

A user-guided Bayesian framework for ensemble feature selection in life science applications (UBayFS). (English) Zbl 07624290

Mach. Learn. 111, No. 10, 3897-3923 (2022).

Summary: Feature selection reduces the complexity of high-dimensional datasets and helps to gain insights into systematic variation in the data. These aspects are essential in domains that rely on model interpretability, such as life sciences. We propose a (U)ser-Guided (Bay)esian Framework for (F)eature (S)election, UBayFS, an ensemble feature selection technique embedded in a Bayesian statistical framework. Our generic approach considers two sources of information: data and domain knowledge. From data, we build an ensemble of feature selectors, described by a multinomial likelihood model. Using domain knowledge, the user guides UBayFS by weighting features and penalizing feature blocks or combinations, implemented via a Dirichlet-type prior distribution. Hence, the framework combines three main aspects: ensemble feature selection, expert knowledge, and side constraints. Our experiments demonstrate that UBayFS (a) allows for a balanced trade-off between user knowledge and data observations and (b) achieves accurate and robust results.

MSC:

68T05

Learning and adaptive systems in artificial intelligence

Keywords:

ensemble feature selection; Bayesian model; Dirichlet-multinomial; user constraints

Software:

R; gglasso; Scikit; spls; GA; hyperdirichlet; propOverlap; hyper2

Cite Review PDF

Full Text: DOI arXiv

OA License

References:

[1]	Bishop, C. M. (1995). Neural networks for pattern recognition. Oxford University Press. · Zbl 0868.68096
[2]	Bose, S.; Das, C.; Banerjee, A.; Ghosh, K.; Chattopadhyay, M.; Chattopadhyay, S.; Barik, A., An ensemble machine learning model based on multiple filtering and supervised attribute clustering algorithm for classifying cancer samples, Peer J Computer Science, 7 (2021) · doi:10.7717/peerj-cs.671
[3]	Brahim, A. B., & Limam, M. (2014). New prior knowledge based extensions for stable feature selection. In 2014 6th international conference of soft computing and pattern recognition (SoCPaR) (pp. 306-311).
[4]	Breiman, L., Random forests, Machine Learning, 45, 1, 5-32 (2001) · Zbl 1007.68152 · doi:10.1023/A:1010933404324
[5]	Breiman, L., Friedman, J., Stone, C. J., & Olshen, R. A. (1984). Classification and regression trees. Taylor & Francis. · Zbl 0541.62042
[6]	Cheng, T.-H., Wei, C.-P. & Tseng, V.S. (2006). Feature selection for medical data mining: Comparisons of expert judgment and automatic approaches. In 19th IEEE symposium on computer-based medical systems (CBMS’06) (p. 165-170).
[7]	Chung, D., Chun, H. & Keles, S. (2019). spls: sparse partial least squares (SPLS) regression and classification [Computer software manual]. R package version 2.2-3.
[8]	Dalton, L. A. (2013). Optimal Bayesian feature selection. In 2013 IEEE global conference on signal and information processing (p. 65-68).
[9]	Danziger, S.; Swamidass, S.; Zeng, J.; Dearth, L.; Lu, Q.; Chen, J.; Cheng, J.; Hoang, VP; Saigo, H.; Luo, R.; Baldi, P.; Brachmann, RK; Lathrop, R., Functional census of mutation sequence spaces: The example of p53 cancer rescue mutants, IEEE/ACM Transactions on Computational Biology and Bioinformatics, 3, 2, 114-124 (2006) · doi:10.1109/TCBB.2006.22
[10]	DeGroot, M. H. (2005). Optimal statistical decisions. Wiley. · Zbl 0225.62006
[11]	Detrano, R.; Janosi, A.; Steinbrunn, W.; Pfisterer, M.; Schmid, J-J; Sandhu, S.; Guppy, KH; Lee, S.; Froelicher, V., International application of a new probability algorithm for the diagnosis of coronary artery disease, American Journal of Cardiology, 64, 5, 304-310 (1989) · doi:10.1016/0002-9149(89)90524-9
[12]	Ding, C.; Peng, H., Minimum redundancy feature selection from microarray gene expression data, Journal of Bioinformatics and Computational Biology, 3, 2, 185-205 (2005) · doi:10.1142/S0219720005001004
[13]	Elghazel, H.; Aussem, A., Unsupervised feature selection with ensemble learning, Machine Learning, 98, 1, 157-180 (2015) · Zbl 1321.68402 · doi:10.1007/s10994-013-5337-8
[14]	Givens, G. H., & Hoeting, J. A. (2012). Computational statistics (Vol. 703). John Wiley & Sons. · Zbl 1079.62001
[15]	Goldstein, O.; Kachuee, M.; Karkkainen, K.; Sarrafzadeh, M., Target-focused feature selection using uncertainty measurements in healthcare data, ACM Transactions on Computing for Healthcare, 1, 3, 1-17 (2020) · doi:10.1145/3383685
[16]	Golub, TR; Slonim, DK; Tamayo, P.; Huard, C.; Gaasenbeek, M.; Mesirov, JP; Coller, H.; Loh, ML; Downing, JR; Caligiuri, MA; Bloomfield, CD; Lander, ES, Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring, Science, 286, 5439, 531-537 (1999) · doi:10.1126/science.286.5439.531
[17]	Gordon, GJ; Jensen, RV; Hsiao, L-L; Gullans, SR; Blumenstock, JE; Ramaswamy, S.; Richards, WG; Sugarbaker, DJ; Bueno, R., Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma, Cancer Research, 62, 17, 4963-4967 (2002)
[18]	Guan, P.; Huang, D.; He, M.; Zhou, B., Lung cancer gene expression database analysis incorporating prior knowledge with support vector machine-based classification method, Journal of Experimental & Clinical Cancer Research., 28, 1, 1-7 (2009) · doi:10.1186/1756-9966-28-103
[19]	Guyon, I.; Weston, J.; Barnhill, S.; Vapnik, V., Gene selection for cancer classification using support vector machines, Machine Learning, 46, 1, 389-422 (2002) · Zbl 0998.68111 · doi:10.1023/A:1012487302797
[20]	Hankin, RKS, A generalization of the Dirichlet distribution, Journal of Statistical Software, 33, 11, 1-18 (2010) · doi:10.18637/jss.v033.i11
[21]	Hankin, R.K.S. (2017). Partial rank data with the hyper2 package: Likelihood functions for generalized Bradley-Terry models. The R Journal, 9.
[22]	Higuera, C.; Gardiner, KJ; Cios, KJ, Self-organizing feature maps identify proteins critical to learning in a mouse model of down syndrome, PloS one, 10, 6, e0129126 (2015) · doi:10.1371/journal.pone.0129126
[23]	Ida, Y., Fujiwara, Y. & Kashima, H. (2019). Fast sparse group lasso. Advances in neural information processing systems (Vol. 32). Curran Associates, Inc.
[24]	Jenul, A., Schrunner, S., Liland, K.H., Indahl, U.G., Futsæther, C.M. & Tomic, O. (2021). RENT—repeated elastic net technique for feature selection. IEEE Access, 9, 152333-152346.
[25]	Liu, M.; Zhang, D., Pairwise constraint-guided sparse learning for feature selection, IEEE Transactions on Cybernetics, 46, 1, 298-310 (2015) · doi:10.1109/TCYB.2015.2401733
[26]	Lyle, C.; Schut, L.; Ru, R.; Gal, Y.; van der Wilk, M., A Bayesian perspective on training speed and model selection, Advances in neural information processing systems, 33, 10396-10408 (2020)
[27]	Mahmoud, O., Harrison, A., Perperoglou, A., Gul, A., Khan, Z. & Lausen, B. (2014). propOverlap: feature (gene) selection based on the proportional overlapping scores [Computer software manual]. R package version 1.0
[28]	Nakajima, S., Sato, I., Sugiyama, M., Watanabe, K. & Kobayashi, H. (2014). Analysis of variational Bayesian latent Dirichlet allocation: Weaker sparsity than MAP. Advances in neural information processing systems (Vol. 27). Curran Associates, Inc.
[29]	Nogueira, S.; Sechidis, K.; Brown, G., On the stability of feature selection algorithms, Journal of Machine Learning Research, 18, 174, 1-54 (2018) · Zbl 1471.62267
[30]	O’Hara, RB; Sillanpää, MJ, A review of Bayesian variable selection methods: What, how and which, Bayesian Analysis, 4, 1, 85-117 (2009) · Zbl 1330.62291
[31]	Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; Vanderplas, J.; Passos, A.; Cournapeau, D.; Brucher, M.; Perrot, M.; Duchesnay, E., Scikit-learn: Machine learning in python, Journal of Machine Learning Research, 12, 2825-2830 (2011) · Zbl 1280.68189
[32]	Petković, M.; Džeroski, S.; Kocev, D., Multi-label feature ranking with ensemble methods, Machine Learning, 109, 11, 2141-2159 (2020) · Zbl 1512.68285 · doi:10.1007/s10994-020-05908-1
[33]	Pozzoli, S.; Soliman, A.; Bahri, L.; Branca, RM; Girdzijauskas, S.; Brambilla, M., Domain expertise-agnostic feature selection for the analysis of breast cancer data, Artificial Intelligence in Medicine, 108, 101928 (2020) · doi:10.1016/j.artmed.2020.101928
[34]	R Core Team. (2020). R: A language and environment for statistical computing [Computer software manual]. Austria.
[35]	Saon, G.; Padmanabhan, M., Minimum Bayes error feature selection for continuous speech recognition, Advances in Neural Information Processing Systems, 13, 800-806 (2001)
[36]	Scrucca, L., GA: A package for genetic algorithms in R, Journal of Statistical Software, 53, 4, 1-37 (2013) · doi:10.18637/jss.v053.i04
[37]	Sechidis, K.; Brown, G., Simple strategies for semi-supervised feature selection, Machine Learning, 107, 2, 357-395 (2018) · Zbl 1457.68239 · doi:10.1007/s10994-017-5648-2
[38]	Seijo-Pardo, B.; Porto-Díaz, I.; Bolón-Canedo, V.; Alonso-Betanzos, A., Ensemble feature selection: Homogeneous and heterogeneous approaches, Knowledge-Based Systems, 118, 124-139 (2017) · doi:10.1016/j.knosys.2016.11.017
[39]	Singh, D.; Febbo, PG; Ross, K.; Jackson, DG; Manola, J.; Ladd, C.; Tamayo, P.; Renshaw, AA; D’Amico, AV; Richie, JP; Lander, ES, Gene expression correlates of clinical prostate cancer behavior, Cancer Cell, 1, 2, 203-209 (2002) · doi:10.1016/S1535-6108(02)00030-2
[40]	Tibshirani, R., Regression shrinkage and selection via the lasso, Journal of the Royal Statistical Society: Series B (Methodological), 73, 3, 273-282 (1996) · Zbl 1411.62212 · doi:10.1111/j.1467-9868.2011.00771.x
[41]	Tsanas, A.; Little, MA; Fox, C.; Ramig, LO, Objective automatic assessment of rehabilitative speech treatment in Parkinson’s disease, IEEE Transactions on Neural Systems and Rehabilitation Engineering, 22, 1, 181-190 (2013) · doi:10.1109/TNSRE.2013.2293575
[42]	Wolberg, WH; Mangasarian, OL, Multisurface method of pattern separation for medical diagnosis applied to breast cytology, Proceedings of the National Academy of Sciences, 87, 23, 9193-9196 (1990) · Zbl 0709.92537 · doi:10.1073/pnas.87.23.9193
[43]	Wong, T-T, Generalized Dirichlet distribution in Bayesian analysis, Applied Mathematics and Computation, 97, 2, 165-181 (1998) · Zbl 0945.62036 · doi:10.1016/S0096-3003(97)10140-0
[44]	Yamada, M.; Jitkrittum, W.; Sigal, L.; Xing, EP; Sugiyama, M., High-dimensional feature selection by feature-wise kernelized lasso, Neural Computation, 26, 1, 185-207 (2014) · Zbl 1410.68328 · doi:10.1162/NECO_a_00537
[45]	Yang, Y.; Zou, H., A fast unified algorithm for solving group-lasso penalize learning problems, Statistics and Computing, 25, 6, 1129-1141 (2015) · Zbl 1331.62343 · doi:10.1007/s11222-014-9498-5
[46]	Yuan, M.; Lin, Y., Model selection and estimation in regression with grouped variables, Journal of the Royal Statistical Society: Series B (Statistical Methodology)., 68, 1, 49-67 (2006) · Zbl 1141.62030 · doi:10.1111/j.1467-9868.2005.00532.x
[47]	Zhao, Z., Wang, L., Liu, H. (2010). Efficient spectral feature selection with minimum redundancy. In Proceedings of the AAAI conference on artificial intelligence (Vol. 24, pp. 673-678).

This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.