×

Evaluation of decision trees: a multi-criteria approach. (English) Zbl 1068.68055

Summary: Data Mining (DM) techniques are being increasingly used in many modern organizations to retrieve valuable knowledge structures from organizational databases, including data warehouses. An important knowledge structure that can result from data mining activities is the Decision Tree (DT) that is used for the classification of future events. The induction of the decision tree is done using a supervised knowledge discovery process in which prior knowledge regarding classes in the database is used to guide the discovery. The generation of a DT is a relatively easy task but in order to select the most appropriate DT it is necessary for the DM project team to generate and analyze a significant number of DTs based on multiple performance measures. We propose a multi-criteria decision analysis based process that would empower DM project teams to do thorough experimentation and analysis without being overwhelmed by the task of analyzing a significant number of DTs would offer a positive contribution to the DM process. We also offer some new approaches for measuring some of the performance criteria.

MSC:

68P15 Database theory
Full Text: DOI

References:

[1] Bradley, A., The use of area under ROC curve in the evaluation of machine learning algorithms, Pattern Recognition Letters, 30, 7, 1145-1159 (1997)
[2] Bohanec, M.; Bratko, I., Trading accuracy for simplicity in decision trees, Machine Learning, 15, 223-250 (1994) · Zbl 0811.68112
[3] Piatetsky-Shapiro, G.; Steingold, S., Measuring lift quality in database marketing, SIGKDD Explorations, 2, 2, 76-80 (2001)
[4] Berry, M.; Linoff, G., Mastering data mining: the art and science of customer relationship management (2000), Wiley: Wiley New York, NY
[5] Han, J.; Kamber, M., Data mining: concepts and techniques (2001), Morgan Kaufman: Morgan Kaufman New York, NY
[6] Lim, T.-. S.; Loh, W.-. Y.; Shih, Y.-. S., A comparison of prediction accuracy, complexity, and training time of thirty-three old and new classification algorithms, Machine Learning, 40, 203-228 (2000) · Zbl 0969.68669
[7] Garofalakis M, Hyun D, Rastogi R, Shim K. Efficient Algorithms for Constructing Decision Trees with Constraints. Proceedings of the 6th ACM SIGKDD International Conference on Data Mining and Knowledge Discovery (KDD-2000), Boston, MA, 2000. p. 335-9.; Garofalakis M, Hyun D, Rastogi R, Shim K. Efficient Algorithms for Constructing Decision Trees with Constraints. Proceedings of the 6th ACM SIGKDD International Conference on Data Mining and Knowledge Discovery (KDD-2000), Boston, MA, 2000. p. 335-9.
[8] Provost F, Fawcett T, Kohavi R. The Case Against Accuracy Estimation for Comparing Induction Algorithms. In: Shavlik J, editor. Proceedings of the Fifteenth International Conference on Machine Learning (ICML98), San Francisco, CA: Morgan Kaufmann, 1998. p. 445-53.; Provost F, Fawcett T, Kohavi R. The Case Against Accuracy Estimation for Comparing Induction Algorithms. In: Shavlik J, editor. Proceedings of the Fifteenth International Conference on Machine Learning (ICML98), San Francisco, CA: Morgan Kaufmann, 1998. p. 445-53.
[9] Gersten W, Wirth R, Arndt D. Predictive Modeling in Automotive Direct Marketing: Tools, Experiences and Open Issues. Proceedings of the 6th ACM SIGKDD International Conference on Data Mining and Knowledge Discovery (KDD-2000), Boston, MA, 2000. p. 398-406.; Gersten W, Wirth R, Arndt D. Predictive Modeling in Automotive Direct Marketing: Tools, Experiences and Open Issues. Proceedings of the 6th ACM SIGKDD International Conference on Data Mining and Knowledge Discovery (KDD-2000), Boston, MA, 2000. p. 398-406.
[10] Kim, H.; Koehler, G., Theory and practice of decision tree induction, Omega, 23, 6, 637-652 (1995)
[11] Esposito, F.; Malerba, D.; Semeraro, G., A Comparative Analysis of Methods for Pruning Decision Trees, IEEE Transactions on Pattern Analysis and Machine Intelligence, 19, 476-491 (1997)
[12] Saaty, T., The Analytic Hierarchy Process: Planning, Priority Setting, Resource Allocation (1980), McGraw-Hill: McGraw-Hill New York · Zbl 0587.90002
[13] Bryson, N., A Goal Programming for Generating Priority Vectors, Journal of the Operational Research Society, 46, 641-648 (1995) · Zbl 0830.90001
[14] Bryson, N.; Mobolurin, A.; Ngwenyama, O., Modelling pairwise comparisons on ratio scales, European Journal of Operational Research, 83, 639-654 (1995) · Zbl 0899.90117
[15] Bryson, N.; Joseph, A., Generating consensus priority interval vectors for group decision making in the AHP, Journal of Multi-Criteria Decision Analysis, 9, 4, 127-137 (2000) · Zbl 1028.90525
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.