×

Invariant optimal feature selection: A distance discriminant and feature ranking based solution. (English) Zbl 1140.68470

Summary: The goal of feature selection is to find the optimal subset consisting of \(m\) features chosen from the total \(n\) features. One critical problem for many feature selection methods is that an exhaustive search strategy has to be applied to seek the best subset among all the possible \(\left( \begin{matrix} n \\ m \end{matrix} \right)\) feature subsets, which usually results in a considerably high computational complexity. The alternative suboptimal feature selection methods provide more practical solutions in terms of computational complexity but they cannot promise that the finally selected feature subset is globally optimal. We propose a new Feature Selection algorithm based on a Distance Discriminant (FSDD), which not only solves the problem of the high computational costs but also overcomes the drawbacks of the suboptimal methods. The proposed method is able to find the optimal feature subset without exhaustive search or Branch and Bound algorithm. The most difficult problem for optimal feature selection, the search problem, is converted into a feature ranking problem following rigorous theoretical proof such that the computational complexity can be greatly reduced. The proposed method is invariant to the linear transformation of data when a diagonal transformation matrix is applied. FSDD was compared with ReliefF and \(mrmr\)MID based on mutual information on 8 data sets. The experiment results show that FSDD outperforms the other two methods and is highly efficient.

MSC:

68T10 Pattern recognition, speech recognition

Software:

ReliefF

References:

[1] Jain, A. K.; Duin, R. P.W.; Mao, J., Statistical pattern recognition: a review, IEEE Trans. Pattern Anal. Mach. Intell., 22, 1 (2000)
[2] Guyon, I.; Elisseeff, A., An introduction to variable and feature selection, J. Mach. Learn. Res., 3, 1157-1182 (2003) · Zbl 1102.68556
[3] Dash, M.; Liu, H., Feature selection for classification, Intelligent Data Anal., 1, 131-156 (1997)
[4] R. Gilad-Bachrac, A. Navot, N. Tishby, Margin based feature selection—theory and algorithms, in: Proceedings of the 21st International Conference on Machine Learning, 2004.; R. Gilad-Bachrac, A. Navot, N. Tishby, Margin based feature selection—theory and algorithms, in: Proceedings of the 21st International Conference on Machine Learning, 2004.
[5] Quah, K. H.; Quek, C., MCES: a novel Monte Carlo evaluative selection approach for objective feature selections, IEEE Trans. Neural Networks, 18, 2 (2007)
[6] Dy, J.; Brodley, C. E., Feature selection for unsupervised learning, J. Mach. Learn. Res., 5, 845-889 (2005) · Zbl 1222.68187
[7] Z. Zhao, H. Liu, Spectral feature selection for supervised and unsupervised learning, in: Proceedings of the 24th International Conference on Machine Learning, ICML2007.; Z. Zhao, H. Liu, Spectral feature selection for supervised and unsupervised learning, in: Proceedings of the 24th International Conference on Machine Learning, ICML2007.
[8] Wolf, L.; Shashua, A., Feature selection for unsupervised and supervised inference: the emergence of sparsity in a weight-based approach, J. Mach. Learn. Res., 6, 1855-1887 (2005) · Zbl 1222.68333
[9] Kohavi, R.; John, G. H., Wrappers for feature subset selection, Artif. Intell., 97, 273-324 (1997) · Zbl 0904.68143
[10] Aeberhard, S.; De Vel, O. Y.; Coomans, D. H., New fast algorithms for error rate-based stepwise variable selection in discriminant analysis, SIAM J. Sci. Comput., 22, 3, 1036-1052 (2000) · Zbl 0968.68204
[11] Kumar, R.; Jayaraman, V. K.; Kulkarni, B. D., An SVM classifier incorporating simultaneous noise reduction and feature selection: illustrative case examples, Pattern Recognition, 38, 1, 41-49 (2005)
[12] Somol, P.; Pudil, P.; Kittler, J., Fast branch & bound algorithms for optimal feature selection, IEEE Trans. Pattern Anal. Mach. Intell., 26, 7 (2004)
[13] Jain, A.; Zongker, D., Feature selection: evaluation, application, and small sample performance, IEEE Trans. Pattern Anal. Mach. Intell., 19, 2 (1997)
[14] P. Pudil, F.J. Ferri, J. Novovicova, J. Kittler, Floating search methods for feature selection with nonmonotonic criterion functions, in: Proceedings of the 12th IAPR International Conference on Pattern Recognition, Conference B: Computer Vision & Image Processing, vol. 2, 1994.; P. Pudil, F.J. Ferri, J. Novovicova, J. Kittler, Floating search methods for feature selection with nonmonotonic criterion functions, in: Proceedings of the 12th IAPR International Conference on Pattern Recognition, Conference B: Computer Vision & Image Processing, vol. 2, 1994.
[15] Kudo, M.; Sklanshy, J., Comparison of algorithm that select features for pattern classifiers, Pattern Recognition, 33, 25-41 (2000)
[16] Pacheco, J.; Casado, S.; Nez, L.; Gmez, O., Analysis of new variable selection methods for discriminant analysis, Comput. Stat. Data Anal., 51, 1463-1478 (2006) · Zbl 1157.62442
[17] Robnik-Sikonja, M.; Kononenko, I., Theoretical and empirical analysis of ReliefF and RReliefF, Mach. Learn., 53, 23-69 (2003) · Zbl 1076.68065
[18] Liu, H.; Li, J.; Wong, L., A comparative study on feature selection and classification methods using gene expression profiles and proteomic patterns, Genome Inf., 13, 51-60 (2002)
[19] Peng, H.; Long, F.; Ding, C., Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy, IEEE Trans. Pattern Anal. Mach. Intell., 27, 1226-1238 (2005)
[20] Available at \(\langle;\) http://www.ics.uci.edu/mlearn/databases/\( \rangle;\); Available at \(\langle;\) http://www.ics.uci.edu/mlearn/databases/\( \rangle;\)
[21] Available at \(\langle;\) http://lib.stat.cmu.edu/datasets/\( \rangle;\); Available at \(\langle;\) http://lib.stat.cmu.edu/datasets/\( \rangle;\)
[22] Friedman, M.; Kandel, A., Introduction to Pattern Recognition: Statistical, Structural, Neural and Fuzzy Logic Approaches (1999), World Scientific Publishing: World Scientific Publishing Singapore, pp. 143-147 · Zbl 0940.68115
[23] Li, H.; Jiang, T.; Zhang, K., Efficient and robust feature extraction by maximum margin criterion, IEEE Trans. Neural Networks, 17, 1, 157-165 (2006)
[24] Kawk, N.; Choi, C. H., Input feature selection by mutual information based on Parzen window, IEEE Trans. Pattern Anal. Mach. Intell., 24, 12, 1667-1671 (2002)
[25] Chow, T. W.S.; Huang, D., Estimating optimal feature subsets using efficient estimation of high-dimensional mutual information, IEEE Trans. Neural Networks, 16, 1 (2005)
[26] Witten, I. H.; Fank, E., Data Mining: Practical Machine Learning Tools and Techniques (2005), Morgan Kaufmann Publishers: Morgan Kaufmann Publishers Los Altos, CA · Zbl 1076.68555
[27] Espostito, F.; Malerba, D.; Semeraro, G., A comparative analysis of methods for pruning decision trees, IEEE Trans. Pattern Anal. Mach. Intell., 19, 5 (1997)
[28] Available at \(\langle;\) http://www.csie.ntu.edu.tw/cjlin/libsvm \(\rangle;\); Available at \(\langle;\) http://www.csie.ntu.edu.tw/cjlin/libsvm \(\rangle;\)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.