×

(Psycho-)analysis of benchmark experiments: a formal framework for investigating the relationship between data sets and learning algorithms. (English) Zbl 1471.62058

Summary: It is common knowledge that the performance of different learning algorithms depends on certain characteristics of the data – such as dimensionality, linear separability or sample size. However, formally investigating this relationship in an objective and reproducible way is not trivial. A new formal framework for describing the relationship between data set characteristics and the performance of different learning algorithms is proposed. The framework combines the advantages of benchmark experiments with the formal description of data set characteristics by means of statistical and information-theoretic measures and with the recursive partitioning of Bradley-Terry models for comparing the algorithms’ performances. The formal aspects of each component are introduced and illustrated by means of an artificial example. Its real-world usage is demonstrated with an application example consisting of thirteen widely-used data sets and six common learning algorithms. The Appendix provides information on the implementation and the usage of the framework within the R language.

MSC:

62-08 Computational methods for problems pertaining to statistics
62-04 Software, source code, etc. for problems pertaining to statistics
Full Text: DOI

References:

[1] Asuncion, A., Newman, D., 2007. UCI Machine Learning Repository. URL http://www.ics.uci.edu/ mlearn/MLRepository.html.
[2] Bischl, B., Wornowizki, M., Borg, K., 2012. mlr: Machine Learning in R. R package version 0.3.1206/r1206. URL http://R-Forge.R-project.org/projects/mlr/.
[3] Bivand, R., 2011. classInt: choose univariate class intervals. R package version 0.1-17. URL http://CRAN.R-project.org/package=classInt.
[4] Bradley, R. A.; Terry, M. E., Rank analysis of incomplete block designs, I. the method of paired comparisons, Biometrica, 39, 3/4, 324-345, (1952) · Zbl 0047.12903
[5] Breiman, L.; Friedman, J. H.; Olshen, R. A.; Stone, C. J., Classification and regression trees, (1984), Chapman and Hall · Zbl 0541.62042
[6] Caruana, R., Karampatziakis, N., Yessenalina, A., 2008. An empirical evaluation of supervised learning in high dimensions. In: Proceedings of the 25th International Conference on Machine Learning. pp. 96-103.
[7] Critchlow, D. E.; Fligner, M. A., Paired comparison, triple comparison, and ranking experiments as generalized linear models, and their implementation on GLIM, Psychometrika, 56, 3, 517-533, (1991) · Zbl 0850.62574
[8] de Souto, M.C.P., Prudencio, R.B.C., Soares, R.G.F., Araujo, D.A.S., Costa, I.G., Ludermir, T.B., Schliep, A., 2008. Ranking and selecting clustering algorithms using a meta-learning approach. In: International Joint Conference on Neural Networks. pp. 3729-3735.
[9] Dimitriadou, E., Hornik, K., Leisch, F., Meyer, D., Weingessel, A., 2011. e1071: misc functions of the department of statistics (e1071), TU Wien. R package version 1.6. URL http://CRAN.R-project.org/package=e1071.
[10] Eugster, M.J.A., 2011. Benchmark Experiments—A Tool for Analyzing Statistical Learning Algorithms. Dr. Hut-Verlag, Ph.D. Thesis, Institut für Statistik, Ludwig-Maximilians-Universität München, Munich, Germany. URL http://edoc.ub.uni-muenchen.de/12990/. · Zbl 1222.62002
[11] Eugster, M.J.A., 2012. benchmark: benchmark experiments toolbox. R package version 0.3-4. URL http://CRAN.R-project.org/package=benchmark.
[12] Eugster, M. J.A.; Hothorn, T.; Leisch, F., Domain-based benchmark experiments: exploratory and inferential analysis, Austrian Journal of Statistics, 41, 1, 5-26, (2012), URL http://www.stat.tugraz.at/AJS/ausg121/121Leisch.pdf
[13] Givens, G. H.; Beveridge, J. R.; Phillips, P. J.; Draper, B.; Lui, Y. M.; Bolme, D. S., Introduction to face recognition and evaluation of algorithm performance, Computational Statistics & Data Analysis, 67, 236-247, (2013), URL http://www.sciencedirect.com/science/article/pii/S0167947313002120 · Zbl 1471.62074
[14] Hastie, T.; Tibshirani, R.; Friedman, J., The elements of statistical learning: data mining, inference, and prediction, (2009), Springer · Zbl 1273.62005
[15] Hornik, K.; Meyer, D., Deriving consensus rankings from benchmarking experiments, (Decker, R.; Lenz, H.-J., Advances in Data Analysis (Proceedings of the 30th Annual Conference of the Gesellschaft für Klassifikation e.V., Freie Universität Berlin, March 8-10, 2006), Studies in Classification, Data Analysis, and Knowledge Organization, (2007), Springer), 163-170
[16] Hothorn, T.; Hornik, K.; Zeileis, A., Unbiased recursive partitioning: a conditional inference framework, Journal of Computational and Graphical Statistics, 15, 3, 651-674, (2006)
[17] Hothorn, T.; Leisch, F.; Zeileis, A.; Hornik, K., The design and analysis of benchmark experiments, Journal of Computational and Graphical Statistics, 14, 3, 675-699, (2005)
[18] Kalousis, A.; ao Gama, J.; Hilario, M., On data and algorithms: understanding inductive performance, Machine Learning, 54, 275-312, (2004) · Zbl 1078.68705
[19] Kalousis, A., Hilario, M., 2000. Model selection via meta-learning: a comparative study. In: Proceedings of the 12th International IEEE Conference on Tools with Artificial Intelligence. pp. 406-413.
[20] King, R. D.; Feng, C.; Sutherland, A., STATLOG: comparison of classification algorithms on large real-world problems, Applied Artifcial Intelligence, 9, 289-333, (1995)
[21] Kuhn, M., 2012. caret: classification and regression training. R package version 5.14-023. URL http://CRAN.R-project.org/package=caret.
[22] Liaw, A.; Wiener, M., Classification and regression by randomforest, R News, 2, 3, 18-22, (2002), URL http://CRAN.R-project.org/doc/Rnews/
[23] Lim, T.-S.; Loh, W.-Y.; Shih, Y.-S., A comparison of prediction accuracy, complexity, and training time of thirty-three old and new classification algorithms, Machine Learning, 40, 3, 203-228, (2000) · Zbl 0969.68669
[24] Quinlan, J. R., C4.5: programms for machine learning, (1993), Morgan Kaufmann Publishers Inc. San Francisco
[25] R: A language and environment for statistical computing, (2011), R Foundation for Statistical Computing Vienna, Austria, URL http://www.R-project.org
[26] Strobl, C.; Boulesteix, A.-L.; Zeileis, A.; Hothorn, T., Bias in random forest variable importance measures: illustrations, sources and a solution, BMC Bioinformatics, 8, 25, (2007)
[27] Strobl, C.; Malley, J.; Tutz, G., An introduction to recursive partitioning: rationale, application and characteristics of classification and regression trees, bagging and random forests, Psychological Methods, 14, 4, 323-348, (2009)
[28] Strobl, C.; Wickelmaier, F.; Zeileis, A., Accounting for individual differences in Bradley-Terry models by means of recursive partitioning, Journal of Educational and Behavioral Statistics, 36, 2, (2011)
[29] Therneau, T.M., Ripley, B.D., 2011. rpart: recursive partitioning. R package version 3.1-50. URL http://CRAN.R-project.org/package=rpart.
[30] Theußl, S.; Zeileis, A., Collaborative software development using R-forge, The R Journal, 1, 1, 9-14, (2009), URL http://journal.r-project.org/
[31] Torti, F.; Perrotta, D.; Atkinson, A. C.; Riani, M., Benchmark testing of algorithms for very robust regression: FS, LMS and LTS, Computational Statistics & Data Analysis, 56, 8, 2501-2512, (2012), URL http://dx.doi.org/10.1016/j.csda.2012.02.003 · Zbl 1252.62033
[32] Venables, W. N.; Ripley, B. D., Modern applied statistics with \(S\), (2002), Springer, URL http://www.stats.ox.ac.uk/pub/MASS4 · Zbl 1006.62003
[33] Vilalta, R.; Drissi, Y., A perspective view and survey of meta-learning, Artificial Intelligence Review, 18, 2, 77-95, (2002)
[34] Wickham, H., Ggplot2: elegant graphics for data analysis, (2009), Springer, URL http://had.co.nz/ggplot2/book · Zbl 1170.62004
[35] Zeileis, A.; Hornik, K., Generalized \(M\)-fluctuation tests for parameter instability, Statistica Neerlandica, 61, 4, 488-508, (2007) · Zbl 1152.62014
[36] Zeileis, A.; Hothorn, T.; Hornik, K., Model-based recursive partitioning, Journal of Computational and Graphical Statistics, 17, 2, 492-514, (2008)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.