×

Ensembling neural networks: Many could be better than all. (English) Zbl 0995.68077

Artif. Intell. 137, No. 1-2, 239-263 (2002); corrigendum ibid. 174, No. 18, 1570 (2010).
Summary: Neural network ensemble is a learning paradigm where many neural networks are jointly used to solve a problem. In this paper, the relationship between the ensemble and its component neural networks is analyzed from the context of both regression and classification, which reveals that it may be better to ensemble many instead of all of the neural networks at hand. This result is interesting because at present, most approaches ensemble all the available neural networks for prediction. Then, in order to show that the appropriate neural networks for composing an ensemble can be effectively selected from a set of available neural networks, an approach named GASEN is presented. GASEN trains a number of neural networks at first. Then it assigns random weights to those networks and employs genetic algorithm to evolve the weights so that they can characterize to some extent the fitness of the neural networks in constituting an ensemble. Finally it selects some neural networks based on the evolved weights to make up the ensemble. A large empirical study shows that, compared with some popular ensemble approaches such as Bagging and Boosting, GASEN can generate neural network ensembles with far smaller sizes but stronger generalization ability. Furthermore, in order to understand the working mechanism of GASEN, the bias-variance decomposition of the error is provided in this paper, which shows that the success of GASEN may lie in that it can significantly reduce the bias as well as the variance.

MSC:

68T05 Learning and adaptive systems in artificial intelligence
Full Text: DOI

References:

[1] Bauer, E.; Kohavi, R., An empirical comparison of voting classification algorithms: Bagging, Boosting, and variants, Machine Learning, 36, 1-2, 105-139 (1999)
[2] Blake, C.; Keogh, E.; Merz, C. J., UCI repository of machine learning databases (1998), Department of Information and Computer Science, University of California: Department of Information and Computer Science, University of California Irvine, CA
[3] Breiman, L., Bagging predictors, Machine Learning, 24, 2, 123-140 (1996) · Zbl 0858.68080
[4] Breiman, L., Bias, variance, and arcing classifiers, Technical Report 460 (1996), Statistics Department, University of California: Statistics Department, University of California Berkeley, CA
[5] Cherkauer, K. J., Human expert level performance on a scientific image analysis task by a system using combined artificial neural networks, (Chan, P.; Stolfo, S.; Wolpert, D., Proc. AAAI-96 Workshop on Integrating Multiple Learned Models for Improving and Scaling Machine Learning Algorithms, Portland, OR (1996), AAAI Press: AAAI Press Menlo Park, CA), 15-21
[6] Cunningham, P.; Carney, J.; Jacob, S., Stability problems with artificial neural networks and the ensemble solution, Artificial Intelligence in Medicine, 20, 3, 217-225 (2000)
[7] Demuth, H.; Beale, M., Neural Network Toolbox for use with MATLAB (1998), The MathWorks: The MathWorks Natick, MA
[8] Drucker, H., Boosting using neural nets, (Sharkey, A., Combining Artificial Neural Nets: Ensemble and Modular Multi-Net Systems (1999), Springer: Springer London), 51-77 · Zbl 0932.68074
[9] Drucker, H.; Schapire, R.; Simard, P., Improving performance in neural networks using a boosting algorithm, (Hanson, S. J.; Cowan, J. D.; Giles, C. L., Advances in Neural Information Processing Systems 5, Denver, CO (1993), Morgan Kaufmann: Morgan Kaufmann San Mateo, CA), 42-49
[10] Efron, B.; Tibshirani, R., An Introduction to the Bootstrap (1993), Chapman & Hall: Chapman & Hall New York · Zbl 0835.62038
[11] Freund, Y., Boosting a weak algorithm by majority, Inform. and Comput., 121, 2, 256-285 (1995) · Zbl 0833.68109
[12] Freund, Y.; Schapire, R. E., A decision-theoretic generalization of on-line learning and an application to boosting, (Proc. EuroCOLT-94, Barcelona, Spain (1995), Springer: Springer Berlin), 23-37
[13] Freund, Y.; Schapire, R. E., Experiments with a new boosting algorithm, (Proc. ICML-96, Bari, Italy (1996), Morgan Kaufmann: Morgan Kaufmann San Mateo, CA), 148-156
[14] German, S.; Bienenstock, E.; Doursat, R., Neural networks and the bias/variance dilemma, Neural Comput., 4, 1, 1-58 (1992)
[15] Goldberg, D. E., Genetic Algorithm in Search, Optimization and Machine Learning (1989), Addison-Wesley: Addison-Wesley Reading, MA · Zbl 0721.68056
[16] Gutta, S.; Wechsler, H., Face recognition using hybrid classifier systems, (Proc. ICNN-96, Washington, DC (1996), IEEE Computer Society Press: IEEE Computer Society Press Los Alamitos, CA), 1017-1022
[17] Hampshire, J.; Waibel, A., A novel objective function for improved phoneme recognition using time-delay neural networks, IEEE Trans. Neural Networks, 1, 2, 216-228 (1990)
[18] Hansen, J. V., Combining predictors: Meta machine learning methods and bias/variance and ambiguity decompositions, Ph.D. Dissertation (2000), Department of Computer Science, University of Aarhus: Department of Computer Science, University of Aarhus Denmark
[19] Hansen, L. K.; Liisberg, L.; Salamon, P., Ensemble methods for handwritten digit recognition, (Proc. IEEE Workshop on Neural Networks for Signal Processing, Helsingoer, Denmark (1992), IEEE Press: IEEE Press Piscataway, NJ), 333-342
[20] Hansen, L. K.; Salamon, P., Neural network ensembles, IEEE Trans. Pattern Anal. Machine Intelligence, 12, 10, 993-1001 (1990)
[21] Houck, C. R.; Joines, J. A.; Kay, M. G., A genetic algorithm for function optimization: A Matlab implementation, Technical Report NCSU-IE-TR-95-09 (1995), North Carolina State University: North Carolina State University Raleigh, NC
[22] Huang, F. J.; Zhou, Z.-H.; Zhang, H.-J.; Chen, T. H., Pose invariant face recognition, (Proc. 4th IEEE International Conference on Automatic Face and Gesture Recognition, Grenoble, France (2000), IEEE Computer Society Press: IEEE Computer Society Press Los Alamitos, CA), 245-250
[23] Jacobs, R. A.; Jordan, M. I.; Nowlan, S. J.; Hinton, G. E., Adaptively mixtures of local experts, Neural Comput., 3, 1, 79-87 (1991)
[24] Jimenez, D., Dynamically weighted ensemble neural networks for classification, (Proc. IJCNN-98, Vol. 1, Anchorage, AK (1998), IEEE Computer Society Press: IEEE Computer Society Press Los Alamitos, CA), 753-756
[25] Jordan, M. I.; Jacobs, R. A., Hierarchical mixtures of experts and the EM algorithm, Neural Comput., 6, 2, 181-214 (1994)
[26] Kohavi, R.; Wolpert, D. H., Bias plus variance decomposition for zero-one loss functions, (Proc. ICML-96, Bari, Italy (1996), Morgan Kaufmann: Morgan Kaufmann San Mateo, CA), 275-283
[27] Kong, E. B.; Dietterich, T. G., Error-correcting output coding corrects bias and variance, (Proc. ICML-95, Tahoe City, CA (1995), Morgan Kaufmann: Morgan Kaufmann San Mateo, CA), 313-321
[28] Krogh, A.; Vedelsby, J., Neural network ensembles, cross validation, and active learning, (Tesauro, G.; Touretzky, D.; Leen, T., Advances in Neural Information Processing Systems 7, Denver, CO (1995), MIT Press: MIT Press Cambridge, MA), 231-238
[29] Maclin, R.; Shavlik, J. W., Combining the predictions of multiple classifiers: Using competitive learning to initialize neural networks, (Proc. IJCAI-95, Montreal, Quebec (1995), Morgan Kaufmann: Morgan Kaufmann San Mateo, CA), 524-530
[30] Mao, J., A case study on bagging, boosting and basic ensembles of neural networks for OCR, (Proc. IJCNN-98, Vol. 3, Anchorage, AK (1998), IEEE Computer Society Press: IEEE Computer Society Press Los Alamitos, CA), 1828-1833
[31] Merz, C. J.; Pazzani, M. J., Combining neural network regression estimates with regularized linear weights, (Mozer, M. C.; Jordan, M. I.; Petsche, T., Advances in Neural Information Processing Systems 9, Denver, CO (1997), MIT Press: MIT Press Cambridge, MA), 564-570
[32] Opitz, D.; Maclin, R., Popular ensemble methods: An empirical study, J. Artificial Intelligence Res., 11, 169-198 (1999) · Zbl 0924.68159
[33] Opitz, D. W.; Shavlik, J. W., Actively searching for an effective neural network ensemble, Connection Science, 8, 3-4, 337-353 (1996)
[34] Opitz, D. W.; Shavlik, J. W., Generating accurate and diverse members of a neural network ensemble, (Touretzky, D. S.; Mozer, M. C.; Hasselmo, M. E., Advances in Neural Information Processing Systems 8, Denver, CO (1996), MIT Press: MIT Press Cambridge, MA), 535-541
[35] Perrone, M. P.; Cooper, L. N., When networks disagree: Ensemble method for neural networks, (Mammone, R. J., Artificial Neural Networks for Speech and Vision (1993), Chapman & Hall: Chapman & Hall New York), 126-142
[36] Quinlan, J. R., Bagging, Boosting, and C4.5, (Proc. AAAI-96, Portland, OR (1996), AAAI Press: AAAI Press Menlo Park, CA), 725-730
[37] Ridgeway, G.; Madigan, D.; Richardson, T., Boosting methodology for regression problems, (Proc. AISTATS-99, Fort Lauderdale, FL (1999), Morgan Kaufmann: Morgan Kaufmann San Mateo, CA), 152-161
[38] Rumelhart, D. E.; Hinton, G. E.; Williams, R. J., Learning internal representations by error propagation, (Rumelhart, D. E.; McClelland, J. L., Parallel Distributed Processing: Explorations in the Microstructure of Cognition, 1 (1986), MIT Press: MIT Press Cambridge, MA), 318-362
[39] Schapire, R. E., The strength of weak learnability, Machine Learning, 5, 2, 197-227 (1990)
[40] (Sharkey, A., Combining Artificial Neural Nets: Ensemble and Modular Multi-Net Systems (1999), Springer: Springer London) · Zbl 0910.00025
[41] Shimshoni, Y.; Intrator, N., Classification of seismic signals by integrating ensembles of neural networks, IEEE Trans. Signal Process., 46, 5, 1194-1201 (1998)
[42] Sollich, P.; Krogh, A., Learning with ensembles: How over-fitting can be useful, (Touretzky, D. S.; Mozer, M. C.; Hasselmo, M. E., Advances in Neural Information Processing Systems 8, Denver, CO (1996), MIT Press: MIT Press Cambridge, MA), 190-196
[43] Ueda, N., Optimal linear combination of neural networks for improving classification performance, IEEE Trans. Pattern Anal. Machine Intelligence, 22, 2, 207-215 (2000)
[44] Weston, J. A.E.; Stitson, M. O.; Gammerman, A.; Vovk, V.; Vapnik, V., Experiments with support vector machines, Technical Report: CSD-TR-96-19 (1996), Royal Holloway University of London: Royal Holloway University of London London
[45] Wolpert, D. H., Stacked generalization, Neural Networks, 5, 2, 241-259 (1992)
[46] Yao, X.; Liu, Y., Making use of population information in evolutionary artificial neural networks, IEEE Transactions on Systems, Man and Cybernetics—Part B: Cybernetics, 28, 3, 417-425 (1998)
[47] Zhou, Z.-H.; Jiang, Y.; Yang, Y.-B.; Chen, S.-F., Lung cancer cell identification based on artificial neural network ensembles, Artificial Intelligence in Medicine, 24, 1, 25-36 (2002)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.