Document Zbl 1505.62073

Oblique decision tree induction by cross-entropy optimization based on the von Mises-Fisher distribution. (English) Zbl 1505.62073

Comput. Stat. 37, No. 5, 2203-2229 (2022).

Summary: Oblique decision trees recursively divide the feature space by using splits based on linear combinations of attributes. Compared to their univariate counterparts, which only use a single attribute per split, they are often smaller and more accurate. A common approach to learn decision trees is by iteratively introducing splits on a training set in a top-down manner, yet determining a single optimal oblique split is in general computationally intractable. Therefore, one has to rely on heuristics to find near-optimal splits. In this paper, we adapt the cross-entropy optimization method to tackle this problem. The approach is motivated geometrically by the observation that equivalent oblique splits can be interpreted as connected regions on a unit hypersphere which are defined by the samples in the training data. In each iteration, the algorithm samples multiple candidate solutions from this hypersphere using the von Mises-Fisher distribution which is parameterized by a mean direction and a concentration parameter. These parameters are then updated based on the best performing samples such that when the algorithm terminates a high probability mass is assigned to a region of near-optimal solutions. Our experimental results show that the proposed method is well-suited for the induction of compact and accurate oblique decision trees in a small amount of time.

Cited in 1 Document

MSC:

62-08	Computational methods for problems pertaining to statistics
62H30	Classification and discrimination; cluster analysis (statistical aspects)
68T05	Learning and adaptive systems in artificial intelligence

Keywords:

oblique decision trees; cross-entropy optimization; von Mises-Fisher distribution; classification; regression

Software:

Scikit; HHCART; UCI-ml

Cite Review PDF

Full Text: DOI

OA License

References:

[1]	Banerjee, A.; Dhillon, IS; Ghosh, J.; Sra, S., Clustering on the unit hypersphere using von Mises-Fisher distributions, J Mach Learn Res, 6, Sep, 1345-1382 (2005) · Zbl 1190.62116
[2]	Bertsimas, D.; Dunn, J., Optimal classification trees, Mach Learn, 106, 7, 1039-1082 (2017) · Zbl 1455.68159 · doi:10.1007/s10994-017-5633-9
[3]	Blanquero, R.; Carrizosa, E.; Molero-Río, C.; Morales, DR, Sparsity in optimal randomized classification trees, Eur J Oper Res, 284, 1, 255-272 (2020) · Zbl 1441.62163 · doi:10.1016/j.ejor.2019.12.002
[4]	Breiman, L.; Friedman, JH; Olshen, RA; Stone, CJ, Classification and regression trees (1984), Monterey: Wadsworth and Brooks, Monterey · Zbl 0541.62042
[5]	Cantú-Paz, E.; Kamath, C., Inducing oblique decision trees with evolutionary algorithms, IEEE Trans Evol Comput, 7, 1, 54-68 (2003) · doi:10.1109/TEVC.2002.806857
[6]	Cover, TM, Geometrical and statistical properties of systems of linear inequalities with applications in pattern recognition, IEEE Trans Electron Comput, 3, 326-334 (1965) · Zbl 0152.18206 · doi:10.1109/PGEC.1965.264137
[7]	De Boer, PT; Kroese, DP; Mannor, S.; Rubinstein, RY, A tutorial on the cross-entropy method, Ann Oper Res, 134, 1, 19-67 (2005) · Zbl 1075.90066 · doi:10.1007/s10479-005-5724-z
[8]	Demšar, J., Statistical comparisons of classifiers over multiple data sets, J Mach Learn Res, 7, 1-30 (2006) · Zbl 1222.68184
[9]	Dheeru D, Karra Taniskidou E (2017) UCI machine learning repository. http://archive.ics.uci.edu/ml. Accesed 2 October 2020
[10]	Dhillon IS, Sra S (2003) Modeling data using directional distributions. Technical reports on TR-03-06, Department of Computer Sciences, The University of Texas at Austin, Austin, TX, USA
[11]	Fisher, RA, The use of multiple measurements in taxonomic problems, Ann Eugen, 7, 2, 179-188 (1936) · doi:10.1111/j.1469-1809.1936.tb02137.x
[12]	Fisher, RA, Dispersion on a sphere, Proc R Soc Lond Ser A Math Phys Sci, 217, 1130, 295-305 (1953) · Zbl 0051.37105 · doi:10.1098/rspa.1953.0064
[13]	Friedman, M., The use of ranks to avoid the assumption of normality implicit in the analysis of variance, J Am Stat Assoc, 32, 200, 675-701 (1937) · JFM 63.1098.02 · doi:10.1080/01621459.1937.10503522
[14]	Friedman, M., A comparison of alternative tests of significance for the problem of m rankings, Ann Math Stat, 11, 1, 86-92 (1940) · Zbl 0063.01455 · doi:10.1214/aoms/1177731944
[15]	Heath D, Kasif S, Salzberg S (1993) Induction of oblique decision trees. In: Proceedings of the thirteenth international joint conference on artificial intelligence. Morgan Kaufmann Publishers, pp 1002-1007
[16]	Heath DG (1993) A geometric framework for machine learning. Ph.D. Thesis, Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
[17]	Holm, S., A simple sequentially rejective multiple test procedure, Scand J Stat, 6, 2, 65-70 (1979) · Zbl 0402.62058
[18]	Li, XB; Sweigart, JR; Teng, JT; Donohue, JM; Thombs, LA; Wang, SM, Multivariate decision trees using linear discriminants and tabu search, IEEE Trans Syst Man Cybern Part A Syst Hum, 33, 2, 194-205 (2003) · doi:10.1109/TSMCA.2002.806499
[19]	Loh, WY; Shih, YS, Split selection methods for classification trees, Stat Sinica, 7, 4, 815-840 (1997) · Zbl 1067.62545
[20]	López-Chau A, Cervantes J, López-García L, Lamont FG (2013) Fisher’s decision tree. Expert Syst Appl 40(16):6283-6291. doi:10.1016/j.eswa.2013.05.044
[21]	Mannor S, Peleg D, Rubinstein R (2005) The cross entropy method for classification. In: Proceedings of the 22nd international conference on machine learning, association for computing machinery, New York, NY, USA, ICML’05, pp 561-568, doi:10.1145/1102351.1102422
[22]	Mardia, KV; Jupp, PE, Directional statistics (1999), Chichester: Wiley, Chichester · doi:10.1002/9780470316979
[23]	Mola, F.; Siciliano, R.; Roli, F.; Kittler, J., Discriminant analysis and factorial multiple splits in recursive partitioning for data mining, Multiple classifier systems, 118-126 (2002), Berlin: Springer, Berlin · Zbl 1047.68877 · doi:10.1007/3-540-45428-4_12
[24]	Murthy SK, Kasif S, Salzberg S, Beigel R (1993) Oc1: A randomized algorithm for building oblique decision trees. In: Proceedings of AAAI. Citeseer, pp 322-327
[25]	Murthy, SK; Kasif, S.; Salzberg, S., A system for induction of oblique decision trees, J Artif Intell Res, 2, 1-32 (1994) · Zbl 0900.68335 · doi:10.1613/jair.63
[26]	Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; Vanderplas, J.; Passos, A.; Cournapeau, D.; Brucher, M.; Perrot, M.; Duchesnay, E., Scikit-learn: machine learning in Python, J Mach Learn Res, 12, 2825-2830 (2011) · Zbl 1280.68189
[27]	Rubinstein, R., The cross-entropy method for combinatorial and continuous optimization, Methodol Comput Appl Probab, 1, 2, 127-190 (1999) · Zbl 0941.65061 · doi:10.1023/A:1010091220143
[28]	Rubinstein, RY, Optimization of computer simulation models with rare events, Eur J Oper Res, 99, 1, 89-112 (1997) · Zbl 0923.90051 · doi:10.1016/S0377-2217(96)00385-2
[29]	Rubinstein, RY; Kroese, DP, The cross-entropy method: a unified approach to combinatorial optimization, Monte-Carlo simulation, and machine learning (2004), Berlin: Springer, Berlin · Zbl 1140.90005 · doi:10.1007/978-1-4757-4321-0
[30]	Siciliano R, Aria M, D’Ambrosio A (2008) Posterior prediction modelling of optimal trees. In: Brito P (ed) COMPSTAT 2008, Physica-Verlag HD, Heidelberg, pp 323-334, doi:10.1007/978-3-7908-2084-3_27 · Zbl 1148.62050
[31]	Sra, S., A short note on parameter approximation for von Mises-Fisher distributions: and a fast implementation of \(I_s (x)\), Comput Stat, 27, 1, 177-190 (2012) · Zbl 1304.65075 · doi:10.1007/s00180-011-0232-x
[32]	Truong AKY (2009) Fast growing and interpretable oblique trees via logistic regression models. Ph.D. Thesis, Oxford University, Oxford, United Kingdom
[33]	Ulrich, G., Computer generation of distributions on the m-sphere, J R Stat Soc Ser C (Appl Stat), 33, 2, 158-163 (1984) · Zbl 0547.65095 · doi:10.2307/2347441
[34]	Wickramarachchi, D.; Robertson, B.; Reale, M.; Price, C.; Brown, J., HHCART: an oblique decision tree, Comput Stat Data Anal, 96, 12-23 (2016) · Zbl 1468.62209 · doi:10.1016/j.csda.2015.11.006
[35]	Wood, AT, Simulation of the von Mises-Fisher distribution, Commun Stat Simul Comput, 23, 1, 157-164 (1994) · Zbl 0825.62022 · doi:10.1080/03610919408813161
[36]	Zlochin, M.; Birattari, M.; Meuleau, N.; Dorigo, M., Model-based search for combinatorial optimization: a critical survey, Ann Oper Res, 131, 1-4, 373-395 (2004) · Zbl 1067.90162 · doi:10.1023/B:ANOR.0000039526.52305.af

This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.