Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
Random Forest is an ensemble machine learning method developed by Leo Breiman in 2001. Since then, it has been considered the state-of-the-art solution in machine learning applications. Compared to the other ensemble methods, random forests exhibit superior predictive performance. However, empirical and statistical studies prove that the random forest algorithm generates unnecessarily large number of base decision trees. This may cost high computational efficiency, predictive time, and occasional decrease in effectiveness. In this paper, Authors survey existing random forest pruning techniques and compare the performance between them. The research revolves around both the static and dynamic pruning technique and analyses the scope of improving the performance of random forest by techniques including generating diverse and accurate decision trees, selecting high performance subset of decision trees, genetic algorithms and other state of art methods, among others.
Random Forest (RF) is an ensemble supervised machine learning technique that was developed by Breiman over a decade ago. Compared with other ensemble techniques, it has proved its accuracy and superiority. Many researchers, however, believe that there is still room for enhancing and improving its performance accuracy. This explains why, over the past decade, there have been many extensions of RF where each extension employed a variety of techniques and strategies to improve certain aspect(s) of RF. Since it has been proven empiricallthat ensembles tend to yield better results when there is a significant diversity among the constituent models, the objective of this paper is twofold. First, it investigates how data clustering (a well known diversity technique) can be applied to identify groups of similar decision trees in an RF in order to eliminate redundant trees by selecting a representative from each group (cluster). Second, these likely diverse representatives are then used to pr...
Systems Science & Control Engineering, 2014
2016
Random Forest (RF) is an ensemble classification technique that was developed by Leo Breiman over a decade ago. Compared with other ensemble techniques, it has proved its accuracy and superiority. Many researchers, however, believe that there is still room for optimizing RF further by enhancing and improving its performance accuracy. This explains why there have been many extensions of RF where each extension employed a variety of techniques and strategies to improve certain aspect(s) of RF. The main focus of this dissertation is to develop new extensions of RF using new optimization techniques that, to the best of our knowledge, have never been used before to optimize RF. These techniques are clustering, the local outlier factor, diversified weighted subspaces, and replicator dynamics. Applying these techniques on RF produced four extensions which we have termed CLUB-DRF, LOFB-DRF, DSB-RF, and RDB-DR respectively. Experimental studies on 15 real datasets showed favorable results, d...
IEEE Transactions on Information Theory
ArXiv, 2021
This appendix accompanies the paper ‘Improving the Accuracy-Memory Trade-Off of Random Forests Via Leaf-Refinement’. It provides results for more experiments which are not given in the paper due to space reasons. 1. Transformation of the Many-Could-Be-Better-Than-All-Theorem
Lecture Notes in Computer Science, 1998
2001
The aim of this paper is to propose a simple procedure that a priori determines a minimum number of classifiers to combine in order to obtain a prediction accuracy level similar to the one obtained with the combination of larger ensembles. The procedure is based on the McNemar non-parametric test of significance. Knowing a priori the minimum size of the classifier ensemble giving the best prediction accuracy, constitutes a gain for time and memory costs especially for huge data bases and real-time applications. Here we applied this procedure to four multiple classifier systems with C4.5 decision tree (Breiman's Bagging, Ho's Random subspaces, their combination we labeled 'Bagfs', and Breiman's Random forests) and five large benchmark data bases. It is worth noticing that the proposed procedure may easily be extended to other base learning algorithms than a decision tree as well. The experimental results showed that it is possible to limit significantly the number...
Lecture Notes in Computer Science, 2010
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
International Journal for Scientific Research and Development, 2015
Int. J. Comput. Syst. Signals, 2000
Knowledge-Based Systems, 2019
Lecture Notes in Computer Science, 2005
WSEAS TRANSACTIONS ON SYSTEMS AND CONTROL, 2021
Applied Sciences
Statistics Surveys, 2009
International Journal of Emerging Trends in Engineering Research, 2022
Statistical Analysis and Data Mining: The ASA Data Science Journal, 2020
Random Forests and Decision Trees , 2012
IEEE Transactions on Pattern Analysis and Machine Intelligence, 1997
The Scientific World Journal, 2015