×

Support vector machines with adaptive \(L_q\) penalty. (English) Zbl 1446.62179

Summary: The standard support vector machine (SVM) minimizes the hinge loss function subject to the \(L_{2}\) penalty or the roughness penalty. Recently, the \(L_{1}\) SVM was suggested for variable selection by producing sparse solutions [P. S. Bradley and O. L. Mangasarian, “Feature selection via concave minimization and support vector machines”, in: Proceedings of the Fifteenth International Conference on Machine Learning, ICML’98. San Francisco, CA: Morgan Kaufmann (1998; doi:10.5555/645527.657467); J. Zhu et al., “1-norm support vector machines”, in: Proceedings of the 16th international conference on neural information processing systems, NIPS’03. Cambridge, MA: MIT Press. 49–56 (2003; doi:10.5555/2981345.2981352)]. These learning methods are non-adaptive since their penalty forms are pre-determined before looking at data, and they often perform well only in a certain type of situation. For instance, the \(L_{2}\) SVM generally works well except when there are too many noise inputs, while the \(L_{1}\) SVM is more preferred in the presence of many noise variables. In this article we propose and explore an adaptive learning procedure called the \(Lq\) SVM, where the best \(q>0\) is automatically chosen by data. Both two- and multi-class classification problems are considered. We show that the new adaptive approach combines the benefit of a class of non-adaptive procedures and gives the best performance of this class across a variety of situations. Moreover, we observe that the proposed \(L_q\) penalty is more robust to noise variables than the \(L_{1}\) and \(L_{2}\) penalties. An iterative algorithm is suggested to solve the \(L_q\) SVM efficiently. Simulations and real data applications support the effectiveness of the proposed procedure.

MSC:

62H30 Classification and discrimination; cluster analysis (statistical aspects)
68T05 Learning and adaptive systems in artificial intelligence

Software:

PDCO

References:

[1] Antoniadis, A.; Fan, J., Regularization of wavelets approximations, J. Amer. Statist. Assoc., 96, 939-967 (2001) · Zbl 1072.62561
[2] Boser, B., Guyon, I., Vapnik, V.N., 1992. A training algorithm for optimal margin classifiers. In: The Fifth Annual Conference on Computational Learning Theory. ACM, Pittsburgh, pp. 142-152.; Boser, B., Guyon, I., Vapnik, V.N., 1992. A training algorithm for optimal margin classifiers. In: The Fifth Annual Conference on Computational Learning Theory. ACM, Pittsburgh, pp. 142-152.
[3] Bradley, P.; Mangasarian, O., Feature selection via concave minimization and support vector machines, (Shavlik, J., ICML’98 (1998), Morgan Kaufmann: Morgan Kaufmann Los Altos, CA)
[4] Chen, S.; Donoho, D. L.; Saunders, M. A., Atomic decomposition by basis pursuit, SIAM J. Sci. Comput., 20, 1, 33-61 (1999) · Zbl 0919.94002
[5] Crammer, K.; Singer, Y., On the algorithmic implementation of multiclass kernel-based vector machines, J. Mach. Learning Res., 2, 265-292 (2001) · Zbl 1037.68110
[6] Donoho, D.; Johnstone, I., Ideal spatial adaptation by wavelet shrinkage, Biometrika, 81, 425-455 (1994) · Zbl 0815.62019
[7] Efron, B.; Hastie, T.; Johnstone, I.; Tibshirani, R., Least angle regression, Ann. Statist., 32, 2, 407-499 (2004) · Zbl 1091.62054
[8] Fan, J.; Li, R., Variable selection via nonconcave penalized likelihood and its oracle properties, J. Amer. Statist. Assoc., 96, 456, 1348-1360 (2001) · Zbl 1073.62547
[9] Frank, I. E.; Friedman, J. H., An statistical view of some chemometrics regression tools, Technometrics, 35, 109-135 (1993) · Zbl 0775.62288
[10] Fu, W. J., Penalized regressions: the bridge vs the lasso, J. Comput. Graphical Statist., 7, 3, 397-416 (1998)
[11] Ikeda, K.; Murata, N., Geometrical properties of \(\nu\) support vector machines with different norms, Neural Comput., 17, 11, 2508-2529 (2005) · Zbl 1075.68074
[12] Knight, K.; Fu, W. J., Asymptotics for lasso-type estimators, Ann. Statist., 28, 5, 1356-1378 (2000) · Zbl 1105.62357
[13] Lee, Y.; Lin, Y.; Wahba, G., Multicategory support vector machines, theory, and application to the classification of microarray data and satellite radiance data, J. Amer. Statist. Assoc., 99, 465, 67-81 (2004) · Zbl 1089.62511
[14] Lin, Y., Support vector machines and the Bayes rule in classification, Data Mining and Knowledge Discovery, 6, 259-275 (2002)
[15] Lin, Y.; Lee, Y.; Wahba, G., Support vector machines for classification in nonstandard situations, Mach. Learning, 46, 191-202 (2002) · Zbl 0998.68103
[16] Tibshirani, R. J., Regression shrinkage and selection via the Lasso, J. Roy. Statist. Soc. Ser. B, 58, 267-288 (1996) · Zbl 0850.62538
[17] Vapnik, V., Statistical Learning Theory (1998), Wiley: Wiley New York · Zbl 0935.62007
[18] Wahba, G., Support vector machines, reproducing kernel Hilbert spaces, and randomized GACV, (Schölkopf, B.; Burges, C. J.C.; Smola, A. J., Advances in Kernel MethodsSupport Vector Learning (1998), MIT Press: MIT Press Cambridge, MA), 125-143 · Zbl 0935.68084
[19] Wahba, G.; Gu, C.; Wang, Y.; Chappell, R., Soft Classification, a.k.a. risk estimation, via penalized log likelihood and smoothing spline analysis of variance, (Petsche, T., Computational Learning Theory and Natural Learning Systems, vol. 3 (1995), MIT Press: MIT Press Cambridge, MA), 127-158 · Zbl 0861.68079
[20] Wang, L.; Shen, X., Multi-category support vector machines, feature selection, and solution path, Statist. Sinica, 16, 617-633 (2006) · Zbl 1096.62070
[21] Weston, J.; Watkins, C., Support vector machines for multi-class pattern recognition, (Proceedings of the Seventh European Symposium on Artificial Neural Networks (1999))
[22] Zhu, J.; Hastie, T.; Rosset, S.; Tibshirani, R., 1-norm support vector machines, Neural Inform. Process. Systems, 16 (2003)
[23] Zou, H.; Hastie, T., Regularization and variable selection via the elastic net, J. Roy. Statist. Soc. Ser. B, 67, 301-320 (2005) · Zbl 1069.62054
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.