×

Oracle inequalities for support vector machines that are based on random entropy numbers. (English) Zbl 1192.68540

Summary: We present a new technique for bounding local Rademacher averages of function classes induced by a loss function and a Reproducing Kernel Hilbert Space (RKHS). At the heart of this technique lies the observation that certain expectations of random entropy numbers can be bounded by the eigenvalues of the integral operator associated with the RKHS. We then work out the details of the new technique by establishing two new oracle inequalities for support vector machines, which complement and generalize previous results.

MSC:

68T05 Learning and adaptive systems in artificial intelligence
68Q32 Computational learning theory
Full Text: DOI

References:

[1] Blanchard, G.; Bousquet, O.; Massart, P., Statistical performance of support vector machines, Ann. Statist., 36, 489-531 (2008) · Zbl 1133.62044
[2] Steinwart, I.; Hush, D.; Scovel, C., A new concentration result for regularized risk minimizers, (Giné, E.; Koltchinskii, V.; Li, W.; Zinn, J., High Dimensional Probability IV (2006), Institute of Mathematical Statistics: Institute of Mathematical Statistics Beachwood, OH), 260-275 · Zbl 1127.68090
[3] Steinwart, I.; Scovel, C., Fast rates for support vector machines using Gaussian kernels, Ann. Statist., 35, 575-607 (2007) · Zbl 1127.68091
[4] Wu, Q.; Ying, Y.; Zhou, D.-X., Multi-kernel regularized classifiers, J. Complexity, 23, 108-134 (2007) · Zbl 1171.65043
[5] Bartlett, P. L.; Bousquet, O.; Mendelson, S., Localized Rademacher complexities, (Kivinen, J.; Sloan, R. H., Proceedings of the 15th Annual Conference on Learning Theory (2002), Springer: Springer New York), 44-58 · Zbl 1050.68054
[6] Mendelson, S., Improving the sample complexity using global data, IEEE Trans. Inform. Theory, 48, 1977-1991 (2002) · Zbl 1061.68128
[7] Mendelson, S., On the performance of kernel classes, J. Mach. Learn. Res., 4, 759-771 (2003) · Zbl 1083.68097
[8] van der Vaart, A. W.; Wellner, J. A., Weak Convergence and Empirical Processes (1996), Springer: Springer New York · Zbl 0862.60002
[9] Steinwart, I.; Christmann, A., Support Vector Machines (2008), Springer: Springer New York · Zbl 1203.68171
[10] Williamson, R.; Smola, A. J.; Schölkopf, B., Generalization performance of regularization networks and support vector machines via entropy numbers of compact operators, IEEE Trans. Inform. Theory, 47, 2516-2532 (2001) · Zbl 1008.62507
[11] Zhou, D. X., The covering number in learning theory, J. Complexity, 18, 739-767 (2002) · Zbl 1016.68044
[12] Carl, B.; Stephani, I., Entropy, Compactness and the Approximation of Operators (1990), Cambridge University Press: Cambridge University Press Cambridge · Zbl 0705.47017
[13] Cristianini, N.; Shawe-Taylor, J., An Introduction to Support Vector Machines (2000), Cambridge University Press: Cambridge University Press Cambridge
[14] Schölkopf, B.; Smola, A. J., Learning with Kernels (2002), MIT Press: MIT Press Cambridge, MA
[15] Steinwart, I.; Scovel, C., Fast rates for support vector machines, (Auer, P.; Meir, R., Proceedings of the 18th Annual Conference on Learning Theory (2005), Springer: Springer New York), 279-294 · Zbl 1137.68564
[16] Dudley, R. M., The sizes of compact subsets of Hilbert spaces and continuity of Gaussian processes, J. Funct. Anal., 1, 290-330 (1967) · Zbl 0188.20502
[17] Mendelson, S., A few notes on statistical learning theory, (Mendelson, S.; Smola, A., Advanced Lectures on Machine Learning: Machine Learning Summer School 2002, Canberra, Australia (2003), Springer: Springer Berlin), 1-40 · Zbl 1019.68093
[18] Pietsch, A., Eigenvalues and \(s\)-Numbers (1987), Geest & Portig K.-G.: Geest & Portig K.-G. Leipzig · Zbl 0615.47019
[19] Shawe-Taylor, J.; Williams, C. K.I.; Cristianini, N.; Kandola, J., On the eigenspectrum of the Gram matrix and its relationship to the operator eigenspectrum, (Cesa-Bianchi, N.; Numao, M.; Reischuk, R., Algorithmic Learning Theory (13th International Conference) (2002), Springer: Springer New York), 23-40 · Zbl 1024.68060
[20] Shawe-Taylor, J.; Williams, C. K.I.; Cristianini, N.; Kandola, J., On the eigenspectrum of the Gram matrix and the generalization error of kernel-PCA, IEEE Trans. Inform. Theory, 51, 2510-2522 (2005) · Zbl 1310.15076
[21] Zwald, L.; Bousquet, O.; Blanchard, G., Statistical properties of kernel principal component analysis, (Shawe-Taylor, J.; Singer, Y., Proceedings of the 17th Annual Conference on Learning Theory (2004), Springer: Springer New York), 594-608 · Zbl 1078.68133
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.