×

Feature selection for label distribution learning based on the statistical distribution of data and fuzzy mutual information. (English) Zbl 07885259

Summary: Label distribution learning (LDL) is an emerging framework in machine learning. Fuzzy mutual information is mutual information under a fuzzy environment and plays an important role in handling uncertainty. This paper explores feature selection for LDL data based on the statistical distribution of data and fuzzy mutual information. The similarity between the feature values in the feature space is first defined by means of the statistical distribution of the data, and a threshold is introduced to control the similarity. Then, the fuzzy similarity relation for each feature subset is established via the similarity. This method utilizes adjustable fuzzy similarity radii to establish a fuzzy similarity relation and improve the classification ability of the data. The decision relation in the label space is then presented, and the decision class of each sample is constructed. Subsequently, two feature selection algorithms based on fuzzy mutual information are designed to remove the irrelevant features by employing a strategy that considers the correlation between the features and labels as well as the redundancy between the features in the LDL data. Finally, the experimental results show that the designed algorithms can effectively measure the uncertainty of LDL data and outperform four state-of-the-art feature selection algorithms. Specifically, our algorithms, LDFM and LDFMR, demonstrate their superiority by achieving overall average ranking improvements of 63.64% and 58.52%, respectively, across six evaluation metrics compared to the other four algorithms.

MSC:

68-XX Computer science
62-XX Statistics
Full Text: DOI

References:

[1] Bermejo, P.; Gámez, J. A.; Puerta, J. M., Speeding up incremental wrapper feature subset selection with Naive Bayes classifier, Knowl.-Based Syst., 55, 140-147, 2014
[2] Demšar, J., Statistical comparisons of classifiers over multiple datasets, J. Mach. Learn. Res., 7, 1-30, 1993 · Zbl 1222.68184
[3] Friedman, M., A comparison of alternative tests of significance for the problem of m rankings, Ann. Inst. Stat. Math., 11, 86-92, 1940 · Zbl 0063.01455
[4] Geng, X., Label distribution learning, IEEE Trans. Knowl. Data Eng., 28, 1734-1748, 2016
[5] Hu, Q. H.; Yu, D. R.; Xie, Z. X.; Liu, J. F., Fuzzy probabilistic approximation spaces and their information measures, IEEE Trans. Fuzzy Syst., 14, 191-201, 2006
[6] He, Z. F.; Yang, M.; Gao, Y.; Liu, H. D.; Yin, Y. L., Joint multi-label classification and label correlations with missing labels and feature selection, Knowl.-Based Syst., 163, 145-158, 2019
[7] Hu, Q. H.; Yu, D.; Xie, Z. X., Information-preserving hybrid data reduction based on fuzzy-rough techniques, Pattern Recognit. Lett., 27, 414-423, 2006
[8] Jalilvand, A.; Salim, N., Feature unionization: a novel approach for dimension reduction, Appl. Soft Comput., 52, 1253-1261, 2017
[9] Kashef, S.; Nezamabadi-pour, H.; Nikpour, B., Multilabel feature selection: a comprehensive review and guiding experiments, Data Min. Knowl. Discov., 8, Article e1240 pp., 2018
[10] Kim, K. J.; Jun, C. H., Rough set model based feature selection for mixed-type data with feature space decomposition, Expert Syst. Appl., 103, 196-205, 2018
[11] Li, Z. W.; Huang, H. X.; Huang, Q.; Lin, Y. H., Attribute reduction for hybrid data based on statistical distribution of data and fuzzy evidence theory, Inf. Sci., 662, Article 120247 pp., 2024 · Zbl 07840725
[12] Liu, J. H.; Lin, Y. J.; Ding, W. P.; Zhang, H. B.; Wang, C.; Du, J. X., Multi-label feature selection based on label distribution and neighborhood rough set, Neurocomputing, 524, 142-157, 2023
[13] Liu, J. H.; Lin, Y. J.; Ding, W. P.; Zhang, H. B.; Du, J. X., Fuzzy mutual information-based multilabel feature selection with label dependency and streaming labels, IEEE Trans. Fuzzy Syst., 31, 77-91, 2022
[14] Li, Y.; Li, T.; Liu, H., Recent advances in feature selection and its applications, Knowl. Inf. Syst., 53, 551-577, 2017
[15] Lin, Y. J.; Li, Y. W.; Wang, C. X., Attribute reduction for multi-label learning with fuzzy rough set, Knowl.-Based Syst., 152, 51-61, 2017
[16] Li, F.; Miao, D. Q.; Pedrycz, W., Granular multi-label feature selection based on mutual information, Pattern Recognit., 67, 410-423, 2017
[17] Li, G. L.; Zhang, H. R.; Min, F.; Lu, Y. N., Two-stage label distribution learning with label-independent prediction based on label-specific features, Knowl.-Based Syst., 267, Article 110426 pp., 2023
[18] Maldonado, S., Dealing with high-dimensional class-imbalanced data sets: embedded feature selection for SVM classification, Appl. Soft Comput., 67, 94-105, 2018
[19] Peng, H. C.; Long, F. H.; Ding, C., Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy, IEEE Trans. Pattern Anal. Mach. Intell., 27, 1226-1238, 2005
[20] Qian, W. B.; Xiong, C. Z.; Wang, Y. L., A ranking-based feature selection for multi-label classification with fuzzy relative discernibility, Appl. Soft Comput., 102, Article 106995 pp., 2021
[21] Qian, W. B.; Huang, J. T.; Wang, Y. L.; Shu, W. H., Mutual information-based label distribution feature selection for multi-label learning, Knowl.-Based Syst., 195, Article 105684 pp., 2020
[22] Qian, W. B.; Huang, J. T.; Wang, Y. L.; Xie, Y. H., Label distribution feature selection for multi-label classification with rough set, Int. J. Approx. Reason., 128, 32-55, 2021 · Zbl 1460.68090
[23] Qian, W. B.; Xu, F. K.; Huang, J. T.; Qian, J., A novel granular ball computing-based fuzzy rough set for feature selection in label distribution learning, Knowl.-Based Syst., 278, Article 110898 pp., 2023
[24] Qian, Y. H.; Liang, J. Y.; Pedrycz, W.; Dang, C. Y., Positive approximation: an accelerator for attribution reduction in rough set theory, Artif. Intell., 174, 597-618, 2010 · Zbl 1205.68310
[25] Reamaroon, N.; Sjoding, M. W.; Lin, K.; Iwashyna, T. J.; Najarian, K., Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome, IEEE J. Biomed. Health Inform., 23, 407-415, 2018
[26] Shannon, C. E., A mathematical theory of communication, Bell Syst. Tech. J., 27, 379-423, 1948 · Zbl 1154.94303
[27] Shu, W. H.; Qian, W. B.; Xie, Y. H., Incremental feature selection for dynamic hybrid data using neighborhood rough set, Knowl.-Based Syst., 194, Article 105516 pp., 2020
[28] Tsai, Y. S.; Yang, U. C.; Chung, I. F.; Huang, C. D., A comparison of mutual and fuzzy-mutual information-based feature selection strategies, (IEEE International Conference on Fuzzy Systems, 2013), 1-6
[29] Wilcoxon, F., Individual comparisons by ranking methods, Biometrics, 1, 80-83, 1992
[30] Wang, Y. Y.; Dai, J. H., Label distribution feature selection based on mutual information in fuzzy rough set theory, (International Joint Conference on Neural Networks, 2019), 14-19
[31] Wang, C. Z.; Wang, C. Y.; Qian, Y. H.; Leng, Q. K., Feature selection based on weighted fuzzy rough sets, IEEE Transactions on Fuzzy Systems
[32] Xu, J. H., A weighte linear discriminant analysis framework for multi-label feature extraction, Neurocomputing, 275, 107-120, 2018
[33] Xu, S. P.; Ju, H. R.; Shang, L., Label distribution learning: a local collaborative mechanism, Int. J. Approx. Reason., 121, 59-84, 2020
[34] Xu, J. C.; Meng, X. R.; Qu, K. L.; Sun, Y. H.; Hou, Q. C., Feature selection using relative dependency complement mutual information in fitting fuzzy rough set model, Appl. Intell., 53, 1-24, 2023
[35] Xiong, C. Z.; Qian, W. B.; Wang, Y. L.; Huang, J. T., Feature selection based on label distribution and fuzzy mutual information, Inf. Sci., 574, 297-319, 2021 · Zbl 1531.68105
[36] Xu, N.; Liu, Y. P.; Geng, X., Label enhancement for label distribution learning, IEEE Trans. Knowl. Data Eng., 33, 1632-1643, 2019
[37] Yu, D.; An, S.; Hu, Q. H., Fuzzy mutual information based min-redundancy and max-relevance heterogeneous feature selection, Int. J. Comput. Intell. Syst., 4, 619-633, 2011
[38] Yang, X. B.; Yao, Y. Y., Ensemble selector for β-fuzzy, Appl. Soft Comput., 70, 1-11, 2018
[39] Yuan, Z.; Chen, H. M.; Zhang, P. F.; Wan, J. H.; Li, T. R., A novel unsupervised approach to heterogeneous feature selection based on fuzzy mutual information, IEEE Trans. Fuzzy Syst., 30, 3395-3409, 2021
[40] Zadeh, L., Fuzzy sets, Inf. Control, 8, 338-353, 1965 · Zbl 0139.24606
[41] Zhao, D. Y.; Zhang, X.; Zhou, Y.; Geng, X., Emotion distribution learning from texts, (Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, 2016), 638-647
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.