×

Learning a priori constrained weighted majority votes. (English) Zbl 1319.68168

Summary: Weighted majority votes allow one to combine the output of several classifiers or voters. MinCq is a recent algorithm for optimizing the weight of each voter based on the minimization of a theoretical bound over the risk of the vote with elegant PAC-Bayesian generalization guarantees. However, while it has demonstrated good performance when combining weak classifiers, MinCq cannot make use of the useful a priori knowledge that one may have when using a mixture of weak and strong voters. In this paper, we propose P-MinCq, an extension of MinCq that can incorporate such knowledge in the form of a constraint over the distribution of the weights, along with general proofs of convergence that stand in the sample compression setting for data-dependent voters. The approach is applied to a vote of \(k\)-NN classifiers with a specific modeling of the voters’ performance. P-MinCq significantly outperforms the classic \(k\)-NN classifier, a symmetric NN and MinCq using the same voters. We show that it is also competitive with LMNN, a popular metric learning algorithm, and that combining both approaches further reduces the error.

MSC:

68T05 Learning and adaptive systems in artificial intelligence
62H30 Classification and discrimination; cluster analysis (statistical aspects)
91B12 Voting theory

Software:

LMNN
Full Text: DOI

References:

[1] Atrey, P. K., Hossain, M. A., El Saddik, A., & Kankanhalli, M. S. (2010). Multimodal Fusion for Multimedia Analysis: a Survey. Multimedia systems, 16(6), 345-379. · doi:10.1007/s00530-010-0182-0
[2] Breiman, L. (1996). Bagging predictors. Machine Learning, 24(2), 123-140. · Zbl 0858.68080
[3] Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5-32. · Zbl 1007.68152 · doi:10.1023/A:1010933404324
[4] Devroye, L., Györfi, L., & Lugosi, G. (1996). A Probabilistic Theory of Pattern Recognition. : Springer-Verlag. · Zbl 0853.68150
[5] Dietterich TG (2000) Ensemble methods in machine learning. In: Multiple Classifier Systems, pp 1-15.
[6] Domingos P (2000) Bayesian averaging of classifiers and the overfitting problem. In: International Conference on Machine Learning, p 223230.
[7] Duda R, Hart P, Stork D (2001) Pattern classification. Pattern Classification and Scene Analysis: Pattern Classification, Wiley, books.google.fr/books?id=YoxQAAAAMAAJ. · Zbl 0968.68140
[8] Floyd, S., & Warmuth, M. K. (1995). Sample Compression, Learnability, and the Vapnik-Chervonenkis Dimension. Machine Learning, 21(3), 269-304.
[9] Freund Y, Schapire R (1996) Experiments with a new boosting algorithm. In: International Conference on Machine Learning, pp 148-156.
[10] Germain P, Lacoste A, Laviolette F, Marchand M, Shanian S (2011) A PAC-Bayes Sample Compression Approach to Kernel Methods. In: International Conference on Machine Learning.
[11] Graepel, T., Herbrich, R., & Shawe-Taylor, J. (2005). PAC-Bayesian Compression Bounds on the Prediction Error of Learning Algorithms for Classification. Machine Learning, 59(1-2), 55-76. · Zbl 1101.68561 · doi:10.1007/s10994-005-0462-7
[12] Haussler, D., Kearns, M., & Schapire, R. (1994). Bounds on the sample complexity of bayesian learning using information theory and the vc dimension. Machine Learning, 14(1), 83-113. · Zbl 0798.68145
[13] Kedem, D., Tyree, S., Weinberger, K., Sha, F., & Lanckriet, G. (2012). Non-linear Metric Learning. Advances in Neural Information Processing Systems, 25, 2582-2590.
[14] Lacasse A, Laviolette F, Marchand M, Germain P, & Usunier N (2007) PAC-Bayes Bounds for the Risk of the Majority Vote and the Variance of the Gibbs Classifier. In: Advances in Neural Information Processing Systems.
[15] Laviolette, F., & Marchand, M. (2007). PAC-Bayes Risk Bounds for Stochastic Averages and Majority Votes of Sample-Compressed Classifiers. Journal of Machine Learning Research, 8, 1461-1487. · Zbl 1222.62075
[16] Laviolette F, Marchand M, & Roy JF (2011) From PAC-Bayes Bounds to Quadratic Programs for Majority Votes. In: International Conference on Machine Learning.
[17] Lever, G., Laviolette, F., & Shawe-Taylor, J. (2013). Tighter pac-bayes bounds through distribution-dependent priors. Theoretical Computer Science, 473, 4-28. · Zbl 1277.68233 · doi:10.1016/j.tcs.2012.10.013
[18] Marchand, M., & Sokolova, M. (2005). Learning with Decision Lists of Data-Dependent Features. Journal of Machine Learning Research, 6, 427-451. · Zbl 1222.68257
[19] Maurer A (2004) A Note on the PAC Bayesian Theorem. CoRR cs.LG/0411099.
[20] McAllester DA (1999) PAC-Bayesian model averaging. In: Annual Conference on Learning Theory, pp 164-170.
[21] McAllester DA (2003) Simplified PAC-Bayesian margin bounds. In: Annual Conference on Learning Theory, pp 203-215. · Zbl 1274.68337
[22] Morvant E, Koço S, Ralaivola L (2012) PAC-Bayesian Generalization Bound on Confusion Matrix for Multi-Class Classification. In: International Conference on Machine Learning.
[23] Morvant E, Habrard A, Ayache S (2014) Majority Vote of Diverse Classifiers for Late Fusion. In: IAPR Joint International Workshops on Statistical Techniques in Pattern Recognition and Structural and Syntactic Pattern Recognition.
[24] Nock, R., Sebban, M., & Bernard, D. (2003). A Simple Locally Adaptive Nearest Neighbor Rule With Application To Pollution Forecasting. International Journal of Pattern Recognition and Artificial Intelligence, 17(8), 1369-1382. · doi:10.1142/S0218001403002952
[25] Opelt A, Fussenegger M, Pinz A, Auer P (2004) Weak Hypotheses and Boosting for Generic Object Detection and Recognition. In: European Conference on Computer Vision, pp 71-84. · Zbl 1098.68834
[26] Re M, Valentini G (2012) Ensemble methods: a review. Advances in machine learning and data mining for astronomy pp 563-582.
[27] Schapire, R., & Singer, Y. (1999). Improved boosting using confidence-rated predictions. Machine Learning, 37(3), 297336. · Zbl 0945.68194 · doi:10.1023/A:1007614523901
[28] Sun, S. (2013). A survey of multi-view machine learning. Neural Computing and Applications, 23(7-8), 2031-2038. · doi:10.1007/s00521-013-1362-6
[29] Weinberger, K. Q., & Saul, L. K. (2009). Distance Metric Learning for Large Margin Nearest Neighbor Classification. Journal of Machine Learning Research, 10, 207-244. · Zbl 1235.68204
[30] Xu C, Tao D, Xu C (2013) A Survey on Multi-view Learning. Tech. rep., arXiv:1304.5634.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.