×

Dealing with missing values in a probabilistic decision tree during classification. (English) Zbl 1171.68664

Zighed, Djamel A. (ed.) et al., Mining complex data. Berlin: Springer (ISBN 978-3-540-88066-0/hbk). Studies in Computational Intelligence 165, 55-74 (2009).
Summary: This chapter deals with the problem of missing values in decision trees during classification. Our approach is derived from the ordered attribute trees method, proposed by O. Lobo and M. Numao[The Japanese Society for Artificial Intelligence 1, 162–168 (2000)], which builds a decision tree for each attribute and uses these trees to fill the missing attribute values. Our method takes into account the dependence between attributes by using Mutual Information. The result of the classification process is a probability distribution instead of a single class. In this chapter, we explain our approach, we then present tests performed of our approach on several real databases and we compare them with those given by Lobo’s method and Quinlan’s method. We also measure the quality of our classification results. Finally, we calculate the complexity of our approach and we discuss some perspectives.
For the entire collection see [Zbl 1152.68007].

MSC:

68T05 Learning and adaptive systems in artificial intelligence
68W05 Nonnumerical algorithms
Full Text: DOI

References:

[1] Breiman, L.; Friedman, J. H.; Olshen, R. A.; Stone, C. J., Classification and regression trees (1984), CA: Wadsworth International Group, CA · Zbl 0541.62042
[2] Roderick, J. A.L.; Donald, B. R., Statistical Analysis with Missing Data (2002), Chichester: Wiley-Interscience, Chichester · Zbl 1011.62004
[3] Shannon, C., Weaver, W.: Théorie mathématique de la communication. Les classiques des sciences humaines (1949) · Zbl 1489.94001
[4] Witten Ian, H.; Frank, E., Data Mining: Practical Machine Learning Tools and Techniques (2005), San Francisco: Morgan Kaufmann, San Francisco · Zbl 1076.68555
[5] Quinlan, J. R., C4.5: Programs for Machine Learning (1993), San Diego: Morgan Kaufmann, San Diego
[6] Tan, M. S.P. N.; Kumar, V., Introduction to Data Mining (2006), Reading: Addison-Wesley, Reading
[7] Friedman, J. H.; Kohavi, R.; Yun, Y., Lazy Decision Trees, Proc. 13th National Conference on Artificial Intelligence and the Eighth Innovative Applications of Artificial Intelligence Conference, 717-724 (1996), Menlo Park: AAAI press, Menlo Park
[8] Hawarah, L.; Simonet, A.; Simonet, M.; Galindo, F.; Takizawa, M.; Traunmüller, R., A probabilistic approach to classify incomplete objects using decision trees, Database and Expert Systems Applications, 549-558 (2004), Heidelberg: Springer, Heidelberg
[9] Hawarah, L.; Simonet, A.; Simonet, M.; Bressan, S.; Küng, J.; Wagner, R., Evaluation of a probabilistic approach to classify incomplete objects using decision trees, Database and Expert Systems Applications, 193-202 (2006), Heidelberg: Springer, Heidelberg · doi:10.1007/11827405_19
[10] Hawarah, L., Simonet, A., Simonet, M.: The complexity of a probabilistic approach to deal with missing values in a decision tree. In: 8th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC 2006), Romania, pp. 26-29 (September 2006b) · Zbl 1171.68664
[11] Hawarah, L., Simonet, A., Simonet, M.: Dealing with Missing Values in a Probabilistic Decision Tree during Classification. In: The Sixth IEEE International Conference on Data Mining-Workshops (ICDM Workshops 2006), Hong Kong, China, December 18-22 (2006c) · Zbl 1171.68664
[12] Kira, K., Rendell, L.A.: A practical approach to feature selection. In: ML 1992: Proceedings of the ninth international workshop on Machine learning, San Francisco, CA, USA, pp. 249-256 (1992)
[13] Kononenko, I., Bratko, I., Roskar, E.: Experiments in Automatic Learning of Medical Diagnostic Rules. Technical Report, Jozef Stefan Institute, Ljubljana, Yugoslavia (1984)
[14] Kononenko, I., Estimating attributes: Analysis and extensions of relief, ECML: European Conference on Machine Learning, 171-182 (1994), Heidelberg: Springer, Heidelberg
[15] Liu, W. Z.; White, A. P.; Thompson, S. G.; Bramer, M. A.; Liu, X.; Cohen, P.; Berthold, M., Techniques for Dealing with Missing Values in Classification, Advances on Intelligent Data Analysis (1997), Heidelberg: Springer, Heidelberg
[16] Lobo, O.; Numao, M., Ordered estimation of missing values, PAKDD 1999: Proceedings of the Third Pacific Asia Conference on Methodologies for Knowledge Discovery and Data Mining, 499-503 (1999), London: Springer, London
[17] Lobo, O.; Numao, M., Ordered estimation of missing values for propositional learning, The Japanese Society for Artificial Intelligence, 1, 162-168 (2000)
[18] Lobo, O., Numao, M.: Suitable domains for using ordered attribute trees to impute missing values. IEICE TRANS INF. and SYST, E84-D, no. 2 (February 2001)
[19] Martin, J.K., Hirschberg, D.S.: The time complexity of decision tree induction. Technical Report. ICS-TR-95-27 (1995)
[20] Newman, D., Hettich, S., Blake, C., Merz, C.: UCI Repository of machine learning databases (1998), http://www.ics.uci.edu/ mlearn/MLRepository.html
[21] Quinlan, J. R., Induction of decision trees, Machine Learning, 1, 81-106 (1986)
[22] Quinlan, J. R., Unknown attribute values in induction, Proc. Sixth International Machine Learning Workshop (1989), San Francisco: Morgan Kaufmann, San Francisco
[23] Quinlan, J. R., Probabilistic decision trees, Machine Learning: an Artificial Intelligence Approach, 3, 140-152 (1990)
[24] Robnik-Sikonja, M., Kononenko, I.: Attribute dependencies, understandability and split selection in tree based models. In: Machine Learning: Proceedings of the Sixteenth International Conference. ICML 1999, pp. 344-353 (1999)
[25] Robnik-Sikonja, M.; Kononenko, I., Theoretical and empirical analysis of relieff and rrelieff, Mach. Learn., 53, 1-2, 23-69 (2003) · Zbl 1076.68065 · doi:10.1023/A:1025667309714
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.