Pegasos: primal estimated sub-gradient solver for SVM

5722 Accesses
651 Citations
6 Altmetric
Explore all metrics

Abstract

We describe and analyze a simple and effective stochastic sub-gradient descent algorithm for solving the optimization problem cast by Support Vector Machines (SVM). We prove that the number of iterations required to obtain a solution of accuracy ${\epsilon}$ is ${\tilde{O}(1 / \epsilon)}$, where each iteration operates on a single training example. In contrast, previous analyses of stochastic gradient descent methods for SVMs require ${\Omega(1 / \epsilon^2)}$ iterations. As in previously devised SVM solvers, the number of iterations also scales linearly with 1/λ, where λ is the regularization parameter of SVM. For a linear kernel, the total run-time of our method is ${\tilde{O}(d/(\lambda \epsilon))}$, where d is a bound on the number of non-zero features in each example. Since the run-time does not depend directly on the size of the training set, the resulting algorithm is especially suited for learning from large datasets. Our approach also extends to non-linear kernels while working solely on the primal objective function, though in this case the runtime does depend linearly on the training set size. Our algorithm is particularly well suited for large text classification problems, where we demonstrate an order-of-magnitude speedup over previous SVM learning methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Amari S.: Natural gradient works efficiently in learning. Neural Comput. 10, 251–276 (1998)
Article Google Scholar
Bordes A., Ertekin S., Weston J., Bottou L.: Fast kernel classifiers with online and active learning. J. Mach. Learn. Res. 6, 1579–1619 (2005)
MathSciNet Google Scholar
Bottou L.: Online Algorithms and Stochastic Approximations. In: Saad, D. (eds) Online learning and neural networks, Cambridge University Press, Cambridge (1998)
Google Scholar
Bottou, L., Bousquet, O.: The tradeoffs of large scale learning. In: Advances in Neural Information Processing Systems 20, pp. 161–168 (2008)
Bottou L., LeCun Y.: Large scale online learning. In: Thrun, S., Saul, L., Schölkopf, B. (eds) Advances in Neural Information Processing Systems 16, MIT Press, Cambridge (2004)
Google Scholar
Bottou L., Murata N.: Stochastic approximations and efficient learning. In: Arbib, M.A. (eds) The Handbook of Brain Theory and Neural Networks, The MIT Press, Cambridge (2002)
Google Scholar
Boyd S., Vandenberghe L.: Convex Optimization, 2nd edn. Cambridge University Press, Cambridge (2004)
MATH Google Scholar
Censor Y., Zenios S.: Parallel Optimization: Theory, Algorithms, and Applications. Oxford University Press, New York (1997)
MATH Google Scholar
Cesa-Bianchi N., Conconi A., Gentile C.: On the generalization ability of on-line learning algorithms. IEEE Trans. Inf, Theory 50(9), 2050–2057 (2004)
Article MathSciNet Google Scholar
Chapelle, O.: Training a support vector machine in the primal. Neural Comput. 19(5), 1155–1178 (2007). doi:10.1162/neco.2007.19.5.1155. http://www.mitpressjournals.org/doi/abs/10.1162/neco.2007.19.5.1155
Crammer K., Dekel O., Keshet J., Shalev-Shwartz S., Singer Y.: Online passive aggressive algorithms. J. Mach. Learn. Res. 7, 551–585 (2006)
MathSciNet Google Scholar
Cristianini N., Shawe-Taylor J.: An Introduction to Support Vector Machines. Cambridge University Press, Cambridge (2000)
Google Scholar
Do, C., Le, Q., Foo, C.: Proximal regularization for online and batch learning. In: Proceedings of the 26th International Conference on Machine Learning (2009)
Duda R.O., Hart P.E.: Pattern Classification and Scene Analysis. Wiley, New York (1973)
MATH Google Scholar
Fine S., Scheinberg K.: Efficient SVM training using low-rank kernel representations. J. Mach. Lear. Res. 2, 242–264 (2001)
Google Scholar
Freund Y., Schapire R.E.: Large margin classification using the perceptron algorithm. Mach. Learn. 37(3), 277–296 (1999)
Article MATH Google Scholar
Hazan, E., Kalai, A., Kale, S., Agarwal, A.: Logarithmic regret algorithms for online convex optimization. In: Proceedings of the Nineteenth Annual Conference on Computational Learning Theory (2006)
Hsieh, C., Chang, K., Lin, C., Keerthi, S., Sundararajan, S.: A dual coordinate descent method for large-scale linear SVM. In: ICML, pp. 408–415 (2008)
Hush, D., Kelly, P., Scovel, C., Steinwart, I.: Qp algorithms with guaranteed accuracy and run time for support vector machines. J. Mach. Learn. Res. (2006)
Joachims T.: Making large-scale support vector machine learning practical. In: Schölkopf, B., Burges, C., Smola, A. (eds) Advances in Kernel Methods—Support Vector Learning., MIT Press, Cambridge (1998)
Google Scholar
Joachims, T.: Training linear SVMs in linear time. In: Proceedings of the ACM Conference on Knowledge Discovery and Data Mining (KDD), pp. 216–226 (2006)
Kakade, S., Tewari, A.: On the generalization ability of online strongly convex programming algorithms. In: Advances in Neural Information Processing Systems 22 (2009)
Kimeldorf G., Wahba G.: Some results on tchebycheffian spline functions. J. Math. Anal. Appl. 33, 82–95 (1971)
Article MATH MathSciNet Google Scholar
Kivinen J., Smola A.J., Williamson R.C.: Online learning with kernels. IEEE Trans. Signal Process. 52(8), 2165–2176 (2002)
Article MathSciNet Google Scholar
Kushner H., Yin G.: Stochastic Approximation Algorithms and Applications. Springer, New York (1997)
MATH Google Scholar
Murata N.: A statistical study of on-line learning. In: Saad, D. (eds) Online Learning and Neural Networks, Cambridge University Press, Cambridge (1998)
Google Scholar
Murata N., Amari S.: Statistical analysis of learning dynamics. Signal Process. 74(1), 3–28 (1999)
Article MATH Google Scholar
Nesterov, Y.: Primal-dual subgradient methods for convex problems. Tech. rep., Center for Operations Research and Econometrics (CORE), Catholic University of Louvain (UCL) (2005)
Platt J.C.: Fast training of Support Vector Machines using sequential minimal optimization. In: Schölkopf, B., Burges, C., Smola, A. (eds) Advances in Kernel Methods—Support Vector Learning, MIT Press, Cambridge (1998)
Google Scholar
Rockafellar R.: Convex Analysis. Princeton University Press, Princeton (1970)
MATH Google Scholar
Shalev-Shwartz, S., Singer, Y., Srebro, N.: Pegasos: Primal Estimated sub-GrAdient SOlver for SVM. In: Proceedings of the 24th International Conference on Machine Learning, pp. 807–814 (2007)
Shalev-Shwartz, S., Srebro, N.: SVM optimization: inverse dependence on training set size. In: Proceedings of the 25th International Conference on Machine Learning, pp. 928–935 (2008)
Smola, A., Vishwanathan, S., Le, Q.: Bundle methods for machine learning. In: Advances in Neural Information Processing Systems 21 (2007)
Spall J.C.: Introduction to Stochastic Search and Optimization. Wiley, New York (2003)
Book MATH Google Scholar
Sridharan, K., Srebro, N., Shalev-Shwartz, S.: Fast rates for regularized objectives. In: Advances in Neural Information Processing Systems 22 (2009)
Vapnik V.N.: Statistical Learning Theory. Wiley, New York (1998)
MATH Google Scholar
Zhang, T.: Solving large scale linear prediction problems using stochastic gradient descent algorithms. In: Proceedings of the Twenty-First International Conference on Machine Learning (2004)

Download references

Author information

Authors and Affiliations

School of Computer Science and Engineering, The Hebrew University of Jerusalem, Jerusalem, Israel
Shai Shalev-Shwartz
Google, Mountain View, CA, USA
Yoram Singer
Toyota Technological Institute at Chicago, Chicago, IL, USA
Nathan Srebro & Andrew Cotter

Authors

Shai Shalev-Shwartz
View author publications
You can also search for this author in PubMed Google Scholar
Yoram Singer
View author publications
You can also search for this author in PubMed Google Scholar
Nathan Srebro
View author publications
You can also search for this author in PubMed Google Scholar
Andrew Cotter
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nathan Srebro.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Shalev-Shwartz, S., Singer, Y., Srebro, N. et al. Pegasos: primal estimated sub-gradient solver for SVM. Math. Program. 127, 3–30 (2011). https://doi.org/10.1007/s10107-010-0420-4

Download citation

Received: 30 August 2009
Accepted: 17 November 2009
Published: 16 October 2010
Issue Date: March 2011
DOI: https://doi.org/10.1007/s10107-010-0420-4

Pegasos: primal estimated sub-gradient solver for SVM

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Speeding Up Budgeted Stochastic Gradient Descent SVM Training with Precomputed Golden Section Search

Automated Optimization of Non-linear Support Vector Machines for Binary Classification

Weighted Coordinate-Wise Pegasos

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification (2000)

Subscribe and save

Buy Now

Navigation

Pegasos: primal estimated sub-gradient solver for SVM

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Speeding Up Budgeted Stochastic Gradient Descent SVM Training with Precomputed Golden Section Search

Automated Optimization of Non-linear Support Vector Machines for Binary Classification

Weighted Coordinate-Wise Pegasos

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification (2000)

Subscribe and save

Buy Now

Search

Navigation