×

Sparse ensembles using weighted combination methods based on linear programming. (English) Zbl 1207.68332

Summary: An ensemble of multiple classifiers is widely considered to be an effective technique for improving accuracy and stability of a single classifier. This paper proposes a framework of sparse ensembles and deals with new linear weighted combination methods for sparse ensembles. Sparse ensemble is to sparsely combine the outputs of multiple classifiers by using a sparse weight vector. When the continuous outputs of multiple classifiers are provided in our methods, the problem of solving sparse weight vector can be formulated as a linear programming problem in which the hinge loss or/and the 1-norm regularization are exploited. Both the hinge loss and the 1-norm regularization are techniques inducing sparsity used in machine learning. We only ensemble classifiers with nonzero weight coefficients. In these LP-based methods, the ensemble training error is minimized while the weight vector of ensemble learning is controlled, which can be thought as implementing the structure risk minimization rule and naturally explains the good performance of these methods. The promising experimental results over UCI data sets and the radar high-resolution range profile data are presented.

MSC:

68T10 Pattern recognition, speech recognition
90C05 Linear programming

Software:

UCI-ml
Full Text: DOI

References:

[1] Kuncheva, L. I., Combining Pattern Classifiers: Methods and Algorithms (2004), Wiley: Wiley Hoboken, NJ · Zbl 1066.68114
[2] T. Windeatt, F. Roli (Eds.), Multiple Classifier Systems, in: Lecture Notes in Computer Science, vol. 2709, Springer, 2003.; T. Windeatt, F. Roli (Eds.), Multiple Classifier Systems, in: Lecture Notes in Computer Science, vol. 2709, Springer, 2003. · Zbl 1029.68928
[3] F. Roli, J. Kittler, T. Windeatt (Eds.), Multiple Classifier Systems, in: Lecture Notes in Computer Science, vol. 3077, Springer, 2004.; F. Roli, J. Kittler, T. Windeatt (Eds.), Multiple Classifier Systems, in: Lecture Notes in Computer Science, vol. 3077, Springer, 2004. · Zbl 1163.68308
[4] Tang, E. K.; Suganthan, P. N.; Yao, X., An analysis of diversity measures, Machine Learning, 65, 247-271 (2006) · Zbl 1470.68188
[5] Breiman, L., Bagging predictors, Machine Learning, 24, 123-140 (1996) · Zbl 0858.68080
[6] Y. Freund, R. Shapire, Experiments with a new boosting algorithm, in: 13th International Conference on Machine Learning, Bary, Italy, Morgan Kaufmann, 1996, pp. 148-156.; Y. Freund, R. Shapire, Experiments with a new boosting algorithm, in: 13th International Conference on Machine Learning, Bary, Italy, Morgan Kaufmann, 1996, pp. 148-156.
[7] Duda, R.; Hart, P.; Stork, D., Pattern Classification (2000), John Wiley & Sons
[8] Ho, T. K.; Hull, J. J.; Srihari, S. N., Decision combination in multiple classifier systems, IEEE Transactions on Pattern Analysis and Machine Intelligence, 16, 1, 66-75 (1994)
[9] S.D. Bay, Combining nearest neighbor classifiers through multiple feature subsets, in: Proceedings of the 15th International Conference on Machine Learning, Madison, WI, 1998, pp. 37-45.; S.D. Bay, Combining nearest neighbor classifiers through multiple feature subsets, in: Proceedings of the 15th International Conference on Machine Learning, Madison, WI, 1998, pp. 37-45.
[10] Kittler, J.; Hatef, M.; Duin, R. P.W.; Matas, J., On combining classifiers, IEEE Transactions on Pattern Analysis and Machine Intelligence, 20, 3, 226-239 (1998)
[11] Benediktsson, J. A.; Sveinsson, J. R.; Ersoy, O. K.; Swain, P. H., Parallel consensual neural networks, IEEE Transactions on Neural Networks, 8, 1, 54-64 (1997)
[12] Ueda, N., Optimal linear combination of neural networks for improving classification performance, IEEE Transactions on Pattern Analysis and Machine Intelligence, 22, 2, 207-215 (2000)
[13] Kuncheva, L. I., A theoretical study on six classifier fusion strategies, IEEE Transactions on Pattern Analysis and Machine Intelligence, 24, 281-286 (2002)
[14] Tresp, V.; Taniguchi, M., Combining estimators using non-constant weighting functions, (Advances in Neural Information Processing Systems, vol. 7 (1995)), 419-426
[15] Breiman, L., Stacked regressions, Machine Learning, 24, 49-64 (1996) · Zbl 0849.68104
[16] Fumera, G.; Roli, F., A theoretical and experimental analysis of linear combiners for multiple classifier systems, IEEE Transactions on Pattern Analysis and Machine Intelligence, 27, 6, 942-956 (2005)
[17] Yao, X.; Liu, Y., Making use of population information in evolutionary artificial neural networks, IEEE Transactions on Systems, Man and Cybernetics, Part B: Cybernetics, 28, 3, 417-425 (1998)
[18] A.J. Grove, D. Schuurmans, Boosting in the limit: maximizing the margin of learned ensembles, in: Proceedings of the 15th National Conference on Artificial Intelligence, American Association for Artificial Intelligence, Menlo Park, CA, USA, 1998, pp. 692-699.; A.J. Grove, D. Schuurmans, Boosting in the limit: maximizing the margin of learned ensembles, in: Proceedings of the 15th National Conference on Artificial Intelligence, American Association for Artificial Intelligence, Menlo Park, CA, USA, 1998, pp. 692-699.
[19] Graepel, T.; Herbrich, R.; Shawe-Taylor, J., Generalization error bounds for sparse linear classifiers, (13th Annual Conference on Computational Learning Theory (2000)), 298-303
[20] Floyd, S.; Marmuth, M., Sample compression learnability, and the Vapnik-Chervonenkis dimension, Machine Learning, 21, 3, 269-304 (1995)
[21] Zhang, L.; Zhou, W. D., On the sparseness of 1-norm support vector machines, Neural Networks, 23, 3, 373-385 (2010) · Zbl 1396.68109
[22] Martínez-Muñoz, G.; Hernández-Lobato, D.; Suárez, A., An analysis of ensemble pruning techniques based on ordered aggregation, IEEE Transactions on Pattern Analysis and Machine Intelligence, 31, 2, 245-259 (2009)
[23] Zhou, Z. H.; Wu, J. X.; Tang, W., Ensembling neural networks: many could be better than all, Artificial Intelligence, 137, 1-2, 239-263 (2002) · Zbl 0995.68077
[24] Zhang, Y.; Burer, S.; Street, W. N., Ensemble pruning via semi-definite programming, Journal of Machine Learning Research, 7, 1315-1338 (2006) · Zbl 1222.90050
[25] Margineantu, D. D.; Dietterich, T. G., Pruning adaptive boosting, (Proceedings of the 14th International Conference on Machine Learning (1997), Morgan Kaufmann Publishers Inc.: Morgan Kaufmann Publishers Inc. San Francisco, CA, USA), 211-218
[26] Martínez-Muñoz, G.; Suárez, A., Aggregation ordering in bagging, (Proceedings of the International Conference on Artificial Intelligence and Applications (2004)), 258-263
[27] Martínez-Muñoz, G.; Suárez, A., Pruning in ordered bagging ensembles, (Proceedings of the 23th International Conference on Machine Learning (2006)), 609-616
[28] Martínez-Muñoz, G.; Suárez, A., Using boosting to prune bagging ensembles, Pattern Recognition Letters, 28, 1, 156-165 (2007)
[29] Chen, H.; Tino, P.; Yao, X., Predictive ensemble pruning by expectation propagation, IEEE Transactions on Knowledge and Data Engineering, 21, 7, 999-1013 (2009)
[30] Tsoumakas, G.; Katakis, I.; Vlahavas, I. P., Effective voting of heterogeneous classifiers, (Proceedings of the 11th European Conference on Machine Learning, Lecture Notes in Artificial Intelligence, vol. 3201 (2004)), 465-476 · Zbl 1132.68603
[31] Tsoumakas, G.; Angelis, L.; Vlahavas, I., Selective fusion of heterogeneous classifiers, Intelligent Data Analysis, 9, 511-525 (2005)
[32] Zhou, Z. H.; Yu, Y., Adapt bagging to nearest neighbor classifier, Journal of Computer Science and Technology, 20, 48-54 (2005)
[33] Vapnik, V., The Nature of Statistical Learning Theory (1995), Springer-Verlag: Springer-Verlag New York · Zbl 0833.62008
[34] Burges, C. J.C., A tutorial on support vector machines for pattern recognition, Data Mining and Knowledge Discovery, 2, 2, 121-167 (1998)
[35] Girosi, F.; Jones, M.; Poggio, T., Regularization theory and neural networks architectures, Neural Computation, 7, 2, 219-269 (1995)
[36] Girosi, F., An equivalence between sparse approximation and support vector machines, Neural Computation, 10, 6, 1455-1480 (1998)
[37] Mangasarian, O., Exact 1-norm support vector machines via unconstrained convex differentiable minimization, Journal of Machine Learning Research, 7, 1517-1530 (2006) · Zbl 1211.68329
[38] Zhu, J.; Rosset, S.; Hastie, T.; Tibshirani, R., 1-norm support vector machines, (Advances in Neural Information Processing Systems, Lecture Notes in Computer Science, vol. 16 (2004), MIT Press: MIT Press Cambridge, MA), 49-56
[39] Bi, J.; Chen, Y.; Wang, J., A sparse support vector machine approach to region-based image categorization, (Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), Lecture Notes in Computer Science (2005), MIT Press: MIT Press Cambridge, MA), 1121-1128
[40] Bi, J.; Bennett, K. P.; Embrechts, M.; Breneman, C. M.; Song, M., Dimensionality reduction via sparse support vector machines, Journal of Machine Learning Research, 4, 1229-1243 (2003) · Zbl 1102.68531
[41] Fung, G. M.; Mangasarian, O. L., A feature selection Newton method for support vector machine classification, Computational Optimization and Applications, 28, 185-202 (2004) · Zbl 1056.90103
[42] Zhou, W. D.; Zhang, L.; Jiao, L. C., Linear programming support vector machines, Pattern Recognition, 35, 12, 2927-2936 (2002) · Zbl 1010.68108
[43] Venderbei, R., Linear Programming: Foundation and Extension (1996), Academic Press: Academic Press New York
[44] P. Murphy, D. Aha, UCI machine learning repository, from \(\langle\) http://www.ics.uci.edu/∼mlearn/MLRepository.html \(\rangle \); P. Murphy, D. Aha, UCI machine learning repository, from \(\langle\) http://www.ics.uci.edu/∼mlearn/MLRepository.html \(\rangle \)
[45] Du, L.; Liu, H. W.; Bao, Z.; Zhang, J. Y., A two-distribution compounded statistical model for radar HRRP target recognition, IEEE Transactions on Signal Processing, 54, 6, 2226-2238 (2006) · Zbl 1374.94656
[46] Chen, B.; Liu, H.; Bao, Z., Optimizing the data-dependent kernel under a unified kernel optimization framework, Pattern Recognition, 41, 6, 2107-2119 (2008) · Zbl 1133.68392
[47] Domeniconi, C.; Peng, J.; Gunopulos, D., Locally adaptive metric nearest neighbor classification, IEEE Transactions on Pattern Analysis and Machine Intelligence, 24, 9, 1281-1285 (2002)
[48] Bao, Y.; Ishii, N.; Du, X., Combining multiple k-nearest neighbor classifiers using different distance functions, (Proceedings of the Fifth International Conference on Intelligent Data Engineering and Automated Learning, Lecture Notes in Computer Science, vol. 3177 (2004), Springer-Verlag: Springer-Verlag Berlin), 634-641
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.