×

A novel univariate marginal distribution algorithm based discretization algorithm. (English) Zbl 1312.62009

Summary: Many data mining algorithms can only deal with discrete data or have a better performance on discrete data; however, for some technological reasons often we can only obtain the continuous value in the real world. Therefore, discretization has played an important role in data mining. Discretization is defined as the process of mapping the continuous attribute space into the discrete space, namely, using integer values or symbols to represent the continuous spaces. In this paper, we proposed a discretization method on the basis of a Univariate Marginal Distribution Algorithm (UMDA). The UMDA is a combination of statistics learning theory and Evolution Algorithms. The fitness function of the UMDA not only took the accuracy of the classifier into account, but also the number of breakpoints. Experimental results showed that the algorithm proposed in this paper could effectively reduce the number of breakpoints, and at the same time, improve the accuracy of the classifier.

MSC:

62-07 Data analysis (statistics) (MSC2010)
68P20 Information storage and retrieval of data

Software:

LIBSVM; UCI-ml
Full Text: DOI

References:

[1] Bashir, S.; Naeem, M.; Shah, S. I., A comparative study of heuristic algorithms: GA and UMDA in spatially multiplexed communication systems, Engineering Applications of Artificial Intelligence, 23, 1, 95-101 (2010)
[2] Blake, C.L., Merz, C.J., 1998. UCI Repository of machine learning databases. University of California, Irvine, Department of Information and Computer Science. http://www.ics.uni.edu/ mlearn/MLRepository.htm; Blake, C.L., Merz, C.J., 1998. UCI Repository of machine learning databases. University of California, Irvine, Department of Information and Computer Science. http://www.ics.uni.edu/ mlearn/MLRepository.htm
[3] Butterworth, R.; Simovici, D. A.; Santos, G. S.; Ohno-Machado, L., A greedy algorithm for supervised discretization, Journal of Biomedical Informatics, 37, 4, 285-292 (2004)
[4] Chang-Hwan, L., A Hellinger-based discretization method for numeric attributes in classification learning, Knowledge-Based Systems, 20, 4, 419-425 (2007)
[5] Chang, Chih-chung, Lin, Chih-Jen, 2001. LIBSVM: a library for support vector machines. http://www.csie.ntu.edu.tw/ cjlin/libsvm/; Chang, Chih-chung, Lin, Chih-Jen, 2001. LIBSVM: a library for support vector machines. http://www.csie.ntu.edu.tw/ cjlin/libsvm/ · Zbl 0993.68080
[6] Chengjung, T.; Chien-I, L.; Weipang, Y., A discretization algorithm based on class-attribute contingency coefficient, Information Sciences, 178, 3, 714-731 (2008)
[7] Cheong Hee, P.; Moonhwi, L., A SVM-based discretization method with application to associative classification, Expert Systems with Applications, 4784-4787 (2009)
[8] Chin, L., Cheng-Jung, T., Ya-Ru, Y., Wei-Pang, Y., 2007. A top-down and greedy method for discretization of continuous attributes. In: 2007 International Conference on Fuzzy Systems and Knowledge Discovery, Haikou, China, pp. 468-472.; Chin, L., Cheng-Jung, T., Ya-Ru, Y., Wei-Pang, Y., 2007. A top-down and greedy method for discretization of continuous attributes. In: 2007 International Conference on Fuzzy Systems and Knowledge Discovery, Haikou, China, pp. 468-472.
[9] Chiu, D.; Wong, A.; Cheung, B., Information discovery through hierarchical maximum entropy discretization and synthesis, (Piatesky-Shapiro, G.; Frowley, W. J., Knowledge Discovery in Databases (1991), MIT Press), 125-140, xii+525
[10] Crossingham, B.; Marwala, T.; Lagazio, M., Evolutionarily optimized rough sets partitions, ICIC Express Letters, 3, 3A, 241-246 (2009)
[11] Fayyad, U.; PiatetskyShapiro, G.; Smyth, P., From data mining to knowledge discovery in databases, Ai Magazine, 17, 3, 37-54 (1996)
[12] Genzhu, Bai; Zhili, Pei; Jian, Wang; Ying, Kong; Lisha, Liu, Attribute discretization method based on rough set theory and information entropy, Application Research of Computers, 1701-1703 (2008)
[13] Hao, Z., Duoqian, M., Ruizhi, W., 2006. A modified Chi2 algorithm based on the significance of attribute. In: 2006 IEEE/WIC/ACM International Conference on Web Intelligence International Intelligence Agent Technology Workshops, HongKong, China, pp. 490-493.; Hao, Z., Duoqian, M., Ruizhi, W., 2006. A modified Chi2 algorithm based on the significance of attribute. In: 2006 IEEE/WIC/ACM International Conference on Web Intelligence International Intelligence Agent Technology Workshops, HongKong, China, pp. 490-493.
[14] He, L., Dayou, Y., Xiaohu, S., Ying, G., 2008. An attribute discretization algorithm based on Rough Set and information entropy. In: Proceedings of the Seventh International Conference on Machine Learning and Cybernetics, Kunming, China, pp. 206-211.; He, L., Dayou, Y., Xiaohu, S., Ying, G., 2008. An attribute discretization algorithm based on Rough Set and information entropy. In: Proceedings of the Seventh International Conference on Machine Learning and Cybernetics, Kunming, China, pp. 206-211.
[15] Huizhong, Y., Junxia, W., Xinguang, S., Namsun, W., 2007. Information system continuous attribute discretization based on binary particle swarm optimization. In: 2007 International Conference on Fuzzy Systems and Knowledge Discovery, Haikou, China, pp. 170-174.; Huizhong, Y., Junxia, W., Xinguang, S., Namsun, W., 2007. Information system continuous attribute discretization based on binary particle swarm optimization. In: 2007 International Conference on Fuzzy Systems and Knowledge Discovery, Haikou, China, pp. 170-174.
[16] Janssens, D.; Brijs, T.; Vanhoof, K.; Wets, G., Evaluating the performance of cost-based discretization versus entropy- and error-based discretization, Computers & Operations Research, 33, 11, 3107-3123 (2006) · Zbl 1113.90093
[17] Jinjie, H., Shiyong, L., 2004. A GA-based approach to rough data model. In: Fifth World Congress on Intelligent Control and Automation, Haikou, China, pp. 1880-1884.; Jinjie, H., Shiyong, L., 2004. A GA-based approach to rough data model. In: Fifth World Congress on Intelligent Control and Automation, Haikou, China, pp. 1880-1884.
[18] Kim, K. J.; Han, I., Genetic algorithms approach to feature discretization in artificial neural networks for the prediction of stock price index, Expert Systems with Applications, 19, 2, 125-132 (2000)
[19] Lei, L.; Deqin, Y.; Yu, S., Bayesian-Chi2 algorithm for discretization of real value attributes, Computer Engineering and Applications, 39-40 (2008), 43
[20] Lei, X.; Fengming, Z.; Xiaochao, J., Discretization algorithm for continuous attributes based on niche discrete particle swarm optimization, Journal of Data Acquisition & Processing, 23, 5, 584-588 (2008)
[21] Muehlenbein, H., The equation for response to selection and its use of prediction, Evolutionary Computation, 5, 3, 303-346 (1997)
[22] øhrn, A., 1999. Rosseta Technical Reference Manual. http://www.lcb.uu.se/tools/rosetta/materials/manual.pdf; øhrn, A., 1999. Rosseta Technical Reference Manual. http://www.lcb.uu.se/tools/rosetta/materials/manual.pdf
[23] Sang, Y.; Deqin, Y.; Lei, L.; Hongxia, L., Imp-Chi2 algorithm for discretization of real value attributes, Computer Engineering, 34, 17, 39-41 (2008)
[24] Shang, L.; Yu, S. Y.; Jia, X. Y.; Ji, Y. S., Selection and optimization of cut-points for numeric attribute values, Computers & Mathematics with Applications, 57, 6, 1018-1023 (2009) · Zbl 1186.68391
[25] Shude, Z.; Zengqi, S., A survey on estimation of distribution algorithms, Acta Automatica Sinica, 33, 2, 113-121 (2007) · Zbl 1174.90912
[26] Wenyu, Q., Eqian, Y., Sang, Y., Hongxia, L., Kitsuregawa, M., Keqiu, L., 2008. A novel Chi2 algorithm for discretization of continuous attributes. In: Progress in WWW Research and Development.10th Asia-Pacific Web Conference, pp. 560-571.; Wenyu, Q., Eqian, Y., Sang, Y., Hongxia, L., Kitsuregawa, M., Keqiu, L., 2008. A novel Chi2 algorithm for discretization of continuous attributes. In: Progress in WWW Research and Development.10th Asia-Pacific Web Conference, pp. 560-571.
[27] Xiaofeng, L.; Pedrycz, W., Logic-based fuzzy networks: A study in system modeling with triangular norms and uninorms, Fuzzy Sets and Systems, 160, 24, 3475-3502 (2009) · Zbl 1185.68546
[28] Zhanguo, X.; Shixiong, X.; Qiang, N.; Lei, Z., Method of discretization of continuous attributes based on improved genetic algorithm, Computer Engineering and Design, 29, 16, 4275-4276 (2008), 4279
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.