×

Hybrid attribute reduction based on a novel fuzzy-rough model and information granulation. (English) Zbl 1129.68073

Summary: Feature subset selection has become an important challenge in areas of pattern recognition, machine learning and data mining. As different semantics are hidden in numerical and categorical features, there are two strategies for selecting hybrid attributes: discretizing numerical variables or numericalize categorical features. In this paper, we introduce a simple and efficient hybrid attribute reduction algorithm based on a generalized fuzzy-rough model. A theoretic framework of fuzzy-rough model based on fuzzy relations is presented, which underlies a foundation for algorithm construction. We derive several attribute significance measures based on the proposed fuzzy-rough model and construct a forward greedy algorithm for hybrid attribute reduction. The experiments show that the technique of variable precision fuzzy inclusion in computing decision positive region can get the optimal classification performance. Number of the selected features is the least but accuracy is the best.

MSC:

68T10 Pattern recognition, speech recognition
68T05 Learning and adaptive systems in artificial intelligence
Full Text: DOI

References:

[1] Guyon, I.; Elisseeff, A., An introduction to variable feature selection, J. Mach. Learn. Res., 3, 1157-1182 (2003) · Zbl 1102.68556
[2] Kwak, N.; Choi, C.-H., Input feature selection for classification problems, IEEE Trans. on Neural Networks, 13, 143-159 (2002)
[3] Muni, D. P.; Das Pal, N. R., Genetic programming for simultaneous feature selection and classifier design, IEEE Trans. Syst. Man Cybern. Part B, 36, 1, 106-117 (2006)
[4] Pavlenko, T., On feature selection, curse-of-dimensionality and error probability in discriminant analysis, J. Stat. Planning Inference, 115, 565-584 (2003) · Zbl 1015.62066
[5] Kohavi, R.; John, G. H., Wrappers for feature subset selection, Artif. Intell., 97, 1-2, 273-324 (1997) · Zbl 0904.68143
[6] K. Kira, L.A. Rendell, The feature selection problem: traditional methods and a new algorithm, in: Proceedings of AAAI-92, San Jose, CA, 1992, pp. 129-134.; K. Kira, L.A. Rendell, The feature selection problem: traditional methods and a new algorithm, in: Proceedings of AAAI-92, San Jose, CA, 1992, pp. 129-134.
[7] Lee, C. K.; Lee, G. G., Information gain and divergence-based feature selection for machine learning-based text categorization, Inf. Process. Manage., 42, 155-165 (2006)
[8] Dash, M.; Liu, H., Consistency-based search in feature selection, Artif. Intell., 151, 155-176 (2003) · Zbl 1082.68791
[9] Mitra, P.; Murthy, C. A.; Pal, S. K., Unsupervised feature selection using feature similarity, IEEE Trans. Pattern Anal. Mach. Intell., 24, 3, 301-312 (2002)
[10] Yu, L.; Liu, H., Feature selection for high-dimensional data: a fast correlation-based filter solution, (Proceedings of the 20th International Conference on Machine Learning (2003)), 856-863
[11] Modrzejewski, M., Feature selection using rough sets theory, (Brazdil, P. B., Proceedings of the European Conference on Machine Learning (1993), Vienna: Vienna Austria), 213-226
[12] Setiono, R.; Liu, H., Neural-network feature selector, IEEE Trans. Neural Networks, 8, 3, 654-662 (1997)
[13] Neumann, J.; Schnorr, C.; Steidl, G., Combined SVM-based feature selection and classification, Mach. Learn., 61, 129-150 (2005) · Zbl 1137.90643
[14] Liu, H.; Setiono, R., Feature selection via discretization, IEEE Trans. Knowl. Data Eng., 9, 4, 642-645 (1997)
[15] Beynon, M. J., An introduction of the condition class space with continuous value discretization and rough set theory, Int. J. Intell. Syst., 21, 2, 173-191 (2006) · Zbl 1088.68792
[16] Chmielewski, M. R.; GrzymalaBusse, J. W., Global discretization of continuous attributes as preprocessing for machine learning, Int. J. Approx. reasoning, 15, 4, 319-331 (1996) · Zbl 0949.68560
[17] Swiniarski, R. W.; Skowron, A., Rough set methods in feature selection and recognition, Pattern Recognition Lett., 24, 833-849 (2003) · Zbl 1053.68093
[18] Jensen, R.; Shen, Q., Semantics-preserving dimensionality reduction: rough and fuzzy-rough-based approaches, IEEE Trans. Knowl. data Eng., 16, 12, 1457-1471 (2004)
[19] R. Jenson, Q. Shen, Fuzzy-rough sets for descriptive dimensionality reductions, Proceedings of IEEE International Conference on Fuzzy Systems, pp. 29-34.; R. Jenson, Q. Shen, Fuzzy-rough sets for descriptive dimensionality reductions, Proceedings of IEEE International Conference on Fuzzy Systems, pp. 29-34.
[20] Tang, W. Y.; Mao, K. Z., Feature selection algorithm for data with both nominal and continuous features, (Ho, T. B.; Cheung, D.; Liu, H., PAKDD 2005, Lecture Notes in Artificial Intelligence, vol. 3518 (2005), Springer: Springer Berlin, Heidelberg), 683-688
[21] Pedrycz, W.; Vukovich, G., Feature analysis through information granulation and fuzzy sets, Pattern Recognition, 35, 825-834 (2002) · Zbl 0997.68114
[22] Shen, Q.; Jensen, R., Selecting informative features with fuzzy-rough sets and its application for complex systems monitoring, Pattern Recognition, 37, 7, 1351-1363 (2004) · Zbl 1070.68600
[23] Jensen, R.; Shen, Q., Fuzzy-rough attribute reduction with application to web categorization, Fuzzy Sets and Systems, 141, 3, 469-485 (2004) · Zbl 1069.68609
[24] Bhatt, R. B.; Gopal, M., On fuzzy-rough sets approach to feature selection, Pattern Recognition Lett., 26, 965-975 (2005)
[25] Bhatt, R. B.; Gopal, M., On the compact computational domain of fuzzy-rough sets, Pattern Recognition Lett., 26, 1632-1640 (2005)
[26] Slezak, D., Approximate entropy reducts, Fundam. Inf., 53, 3-4, 365-390 (2002) · Zbl 1092.68676
[27] Wang, G. Y.; Zhao, J.; An, J. J., A comparative study of algebra viewpoint and information viewpoint in attribute reduction, Fundam. Inf., 68, 3, 289-301 (2005) · Zbl 1098.68134
[28] Hu, Q. H.; Yu, D. R., Entropies of fuzzy indiscernibility relation and its operations, Int. J. Uncertainty Fuzziness Knowl Based Syst., 12, 5, 575-589 (2004) · Zbl 1086.94048
[29] Hu, Q. H.; Yu, D. R.; Xie, Z. X.; Liu, J. F., Fuzzy probabilistic approximation spaces and their information measures, IEEE Trans. Fuzzy Syst., 14, 2, 191-201 (2006)
[30] Hu, Q. H.; Yu, D. R.; Xie, Z. X., Information-preserving hybrid data reduction based on fuzzy-rough techniques, Pattern Recognition Lett., 27, 5, 414-423 (2006)
[31] Zadeh, L., Toward a theory of fuzzy information granulation and its centrality in human reasoning and fuzzy logic, Fuzzy Sets and Systems, 19, 111-127 (1997) · Zbl 0988.03040
[32] Yao, Y. Y., Information granulation and rough set approximation, Int. J. Intell. Syst., 16, 1, 87-104 (2001) · Zbl 0969.68079
[33] Skowron, A.; Stepaniuk, J., Information granules: towards foundations of granular computing, Int. J. Intell. Syst., 16, 57-85 (2001) · Zbl 0969.68078
[34] Bortolan, G.; Pedrycz, W., Fuzzy descriptive models: an interactive framework of information granulation, IEEE Trans. Fuzzy Syst., 10, 6, 743-755 (2002)
[35] Zhang, Y.-Q., Constructive granular systems with universal approximation and fast knowledge discovery, IEEE Trans. Fuzzy Syst., 13, 1, 48-57 (2005)
[36] T.Y. Lin, Neighborhood systems and relational database, Abstract, Proceedings of CSC ’88, February, 1988, p. 725.; T.Y. Lin, Neighborhood systems and relational database, Abstract, Proceedings of CSC ’88, February, 1988, p. 725.
[37] Berthold, M. R.; Ortolani, M.; Patterson, D., Fuzzy information granules in time series data, Int. J. Intell. Syst., 19, 7, 607-618 (2004) · Zbl 1101.68803
[38] Bargiela, A.; Pedrycz, W., Recursive information granulation: aggregation and interpretation issues, IEEE Trans. Syst. Man Cybern. Part B, 33, 1, 96-112 (2003)
[39] Lin, T. Y., Granular computing: fuzzy logic and rough sets, (Zadeh, L. A.; Kacprzyk, J., Computing with Words in Information/Intelligent Systems (1999), Physica-Verlag: Physica-Verlag Wurzburg), 183-200 · Zbl 0949.68067
[40] Zadeh, L. A., Fuzzy logic equals computing with words, IEEE Trans. Fuzzy Syst., 4, 2, 103-111 (1996)
[41] Yao, Y. Y., A partition model of granular computing, LNCS Trans. Rough Sets, 1, 232-253 (2004) · Zbl 1104.68776
[42] Pedrycz, W.; Vasilakos, A. V., Linguistic models and linguistic modeling, IEEE Trans. Syst. Man Cybernet. Part B, 29, 6, 745-757 (1999)
[43] Lin, T. Y., Data mining and machine oriented modeling: a granular computing approach, J. Appl. Intell., 13, 2, 113-124 (2000)
[44] Y.H. Chen, Y.Y. Yao, Multiview intelligent data analysis based on granular computing, Proceedings of 2006 IEEE International Conference on Granular Computing, 2006.; Y.H. Chen, Y.Y. Yao, Multiview intelligent data analysis based on granular computing, Proceedings of 2006 IEEE International Conference on Granular Computing, 2006.
[45] Pawlak, Z., Rough Sets—Theoretical Aspects of Reasoning about Data (1991), Kluwer Academic: Kluwer Academic Dordrecht · Zbl 0758.68054
[46] Dubois, D.; Prade, H., Rough fuzzy sets and fuzzy rough sets, Int. J. General Syst., 17, 2-3, 191-209 (1990) · Zbl 0715.04006
[47] Wu, W.; Zhang, W., Constructive and axiomatic approaches of fuzzy approximation operators, Inf. Sci., 159, 3-4, 233-254 (2004) · Zbl 1071.68095
[48] Lin, T. Y., Granular data model: semantic data mining and computing with words, (Proceeding of IEEE Conference on Fuzzy Systems (2004)), 1141-1146
[49] Yeung, D. S.; Chen, D. G.; Tsang, E. C.C.; Lee, J. W.T.; Wang, X. Z., On the generalization of fuzzy rough sets, IEEE Trans. Fuzzy Syst., 13, 3, 343-361 (2005)
[50] Guillaume, S.; Charnomordic, B., Generating an interpretable family of fuzzy partitions from data, IEEE Trans. Fuzzy Syst., 12, 3, 324-335 (2004)
[51] Bargiela, A.; Pedrycz, W., A model of granular data: a design problem with the Tchebyschev FCM, Soft. Comput., 9, 155-163 (2005) · Zbl 1101.68485
[52] Ma, Z. M.; Zhang, W. J.; Ma, W. Y., Assessment of data redundancy in fuzzy relational databases based on semantic inclusion degree, Inf. Process. Lett., 72, 25-29 (1999) · Zbl 1338.68060
[53] Xu, Z. B.; Liang, J. Y.; Dang, C. Y.; Chin, K. S., Inclusion degree: a perspective on measures for rough set data analysis, Inf. Sci., 141, 3-4, 227-236 (2002) · Zbl 1008.68134
[54] Zadeh, L. A., Fuzzy sets, Inf. Control, 8, 338-353 (1965) · Zbl 0139.24606
[55] Oh, S.-K.; Pedrycz, W.; Park, H.-S., Implicit rule-based fuzzy-neural networks using the identification algorithm of GA hybrid scheme based on information granulation, Adv. Eng. Inf., 16, 247-263 (2002)
[56] Q. Hu, D. Yu, An improved clustering algorithm for information granulation, Lecture Notes in Artificial Intelligence, vol. 3613, FSKD 2005, Proceedings, 2005, pp. 494-504.; Q. Hu, D. Yu, An improved clustering algorithm for information granulation, Lecture Notes in Artificial Intelligence, vol. 3613, FSKD 2005, Proceedings, 2005, pp. 494-504.
[57] Lee, H.-S., An optimal algorithm for computing the max-min transitive closure of a fuzzy similarity matrix, Fuzzy Sets and Systems, 123, 1, 129-136 (2001) · Zbl 1003.65043
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.