skip to main content
article
Free access

Fast and Scalable Local Kernel Machines

Published: 01 August 2010 Publication History

Abstract

A computationally efficient approach to local learning with kernel methods is presented. The Fast Local Kernel Support Vector Machine (FaLK-SVM) trains a set of local SVMs on redundant neighbourhoods in the training set and an appropriate model for each query point is selected at testing time according to a proximity strategy. Supported by a recent result by Zakai and Ritov (2009) relating consistency and localizability, our approach achieves high classification accuracies by dividing the separation function in local optimisation problems that can be handled very efficiently from the computational viewpoint. The introduction of a fast local model selection further speeds-up the learning process. Learning and complexity bounds are derived for FaLK-SVM, and the empirical evaluation of the approach (with data sets up to 3 million points) showed that it is much faster and more accurate and scalable than state-of-the-art accurate and approximated SVM solvers at least for non high-dimensional data sets. More generally, we show that locality can be an important factor to sensibly speed-up learning approaches and kernel methods, differently from other recent techniques that tend to dismiss local information in order to improve scalability.

References

[1]
David W. Aha. Lazy Learning. Kluwer Academic Publishers Norwell, MA, USA, 1997.
[2]
Erin L. Allwein, Robert E. Schapire, and Yoram Singer. Reducing multiclass to binary : A unifying approach for margin classifiers. Journal of Machine Learning Research, 1:113-141, 2000.
[3]
Arthur Asuncion and David Newman. UCI machine learning repository, 2007. URL http://www. ics.uci.edu/~mlearn/MLRepository.html.
[4]
Christopher G. Atkeson, Andrew W. Moore, and Stefan Schaal. Locally weighted learning. Artificial Intelligence Review, 11(1-5):11-73, 1997.
[5]
Yoshua Bengio, Olivier Delalleau, and Nicolas Le Roux. The curse of highly variable functions for local kernel machines. In Advances in Neural Information Processing Systems, volume 18, 2005.
[6]
Alina Beygelzimer, Sham Kakade, and John Langford. Cover trees for nearest neighbor. In Twenty-third International Conference on Machine Learning (ICML 06), pages 97-104, New York, NY, USA, 2006. ACM.
[7]
Jock A Blackard and Denis J. Dean. Comparative accuracies of artificial neural networks and discriminant analysis in predicting forest cover types from cartographic variables. Computers and Electronics in Agriculture, 24:131-151, 1999.
[8]
Enrico Blanzieri and Anton Bryl. Evaluation of the highest probability SVM nearest neighbor classifier with variable relative error cost. In CEAS 2007, Mountain View, California, 2007.
[9]
Enrico Blanzieri and Farid Melgani. An adaptive SVM nearest neighbor classifier for remotely sensed imagery. In IEEE International Conference on Geoscience and Remote Sensing Symposium (IGARSS 06), pages 3931-3934, 2006.
[10]
Enrico Blanzieri and Farid Melgani. Nearest neighbor classification of remote sensing images with the maximal margin principle. IEEE Transactions on Geoscience and Remote Sensing, 46(6): 1804-1811, 2008.
[11]
Antoine Bordes and Léon Bottou. The Huller: A simple and efficient online SVM. In Machine Learning: ECML 2005, Lecture Notes in Artificial Intelligence, LNAI 3720, pages 505-512. Springer Verlag, 2005.
[12]
Antoine Bordes, Seyda Ertekin, Jason Weston, and Léon Bottou. Fast kernel classifiers with online and active learning. Journal of Machine Learning Research, 6:1579-1619, 2005.
[13]
Antoine Bordes, Léon Bottou, and Patrick Gallinari. SGD-QN: Careful quasi-Newton stochastic gradient descent. Journal of Machine Learning Research, 10:1737-1754, July 2009.
[14]
Léon Bottou and Vladimir N. Vapnik. Local learning algorithms. Neural Computation, 4(6):888- 900, 1992.
[15]
Léon Bottou, Corinna Cortes, John S. Denker, Harris Drucker, Isabelle Guyon, Lawrence D. Jackel, Yann Le Cun, Urs A. Muller, Eduard Säckinger, Patrice Simard, and Vladimir Vapnik. Comparison of classifier methods: A case study in handwritten digit recognition. In Twelveth IAPR International Conference on Pattern Recognition, Conference B: Computer Vision & Image Processing, volume 2, pages 77-82. IEEE, 1994.
[16]
Leo Breiman. Bagging predictors. Machine Learning, 24(2):123-140, August 1996.
[17]
David Broomhead and David Lowe. Multivariable functional interpolation and adaptive networks. Complex Systems, 2:321-355, 1988.
[18]
Chih-Chung Chang and Chih-Jen Lin. LIBSVM: A Library for Support Vector Machines, 2001. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.
[19]
Kai-Wei Chang, Cho-Jui Hsieh, and Chih-Jen Lin. Coordinate descent method for large-scale l2- loss linear support vector machines. Journal of Machine Learning Research, 9:1369-1398, 2008.
[20]
Qun Chang, Qingcai Chen, and Xiaolong Wang. Scaling Gaussian RBF kernel width to improve SVM classification. International Conference on Neural Networks and Brain, 2005. (ICNN&B 05), 1:19-22, 2005.
[21]
Long Chen. New analysis of the sphere covering problems and optimal polytope approximation of convex bodies. Journal of Approximation Theory, 133(1):134, 2005.
[22]
Haibin Cheng, Pang-Ning Tan, and Rong Jin. Localized support vector machine and its efficient algorithm. SIAM International Conference on Data Mining, 2007.
[23]
Vasek Chvatal. A greedy heuristic for the set-covering problem. Mathematics of Operations Research, pages 233-235, 1979.
[24]
Kenneth L. Clarkson. Nearest neighbor queries in metric spaces. In Twenty-ninth Annual ACM Symposium on Theory of computing (STOC 97), pages 609-617, New York, NY, USA, 1997. ACM.
[25]
Michael Collins, Amir Globerson, Terry Koo, Xavier Carreras, and Peter L. Bartlett. Exponentiated gradient algorithms for conditional random fields and max-margin markov networks. Journal of Machine Learning Research, 9:1775-1822, 2008.
[26]
Ronan Collobert, Fabian Sinz, Jason Weston, and Léon Bottou. Trading convexity for scalability. In Twenty-third International Conference on Machine Learning (ICML 06), pages 201-208, New York, NY, USA, 2006. ACM.
[27]
Corinna Cortes and Vladimir Vapnik. Support-vector networks. Machine Learning, 20(3):273-297, 1995.
[28]
Janez Dem¿ar. Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research, 7:1-30, 2006.
[29]
Jian-xiong Dong. Fast SVM training algorithm with decomposition on very large data sets. IEEE Transaction Pattern Analysis and Machine Intelligence, 27(4):603-618, 2005. Senior Member-Krzyzak, Adam and Fellow-Suen, Ching Y.
[30]
Rong-En Fan, Pai-Hsuen Chen, and Chih-Jen Lin. Working set selection using second order information for training support vector machines. The Journal of Machine Learning Research, 6: 1889-1918, 2005.
[31]
Rong-En Fan, Kai-Wei Chang, Cho-Jui Hsieh, Xiang-Rui Wang, and Chih-Jen Lin. LIBLINEAR: A library for large linear classification. The Journal of Machine Learning Research, 9:1871-1874, 2008.
[32]
Milton Friedman. A comparison of alternative tests of significance for the problem of m rankings. The Annals of Mathematical Statistics, 11(1):86-92, 1940.
[33]
Michael R. Garey and David S. Johnson. Computers and Intractability: A Guide to the Theory of NP-Completeness. WH Freeman & Co. New York, NY, USA, 1979.
[34]
Seth Hettich and Stephen D. Bay. The UCI KDD archive, 1999. URL http://kdd.ics.uci.edu.
[35]
Cho-Jui Hsieh, Kai-Wei Chang, Chih-Jen Lin, S. Sathiya Keerthi, and S. Sundararajan. A dual coordinate descent method for large-scale linear SVM. In Twenty-fifth International Conference on Machine Learning (ICML 08), pages 408-415, New York, NY, USA, 2008. ACM.
[36]
Chih-Wei Hsu and Chih-Jen Lin. A comparison of methods for multiclass support vector machines. IEEE Transactions on Neural Networks, 13(2):415-425, 2002.
[37]
Thorsten Joachims. Making large-scale support vector machine learning practical. Advances in kernel methods: support vector learning, pages 169-184, 1999.
[38]
Thorsten Joachims. Training linear SVMs in linear time. In Twelveth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 217-226. ACM New York, NY, USA, 2006.
[39]
Thorsten Joachims and Chun-Nam Yu. Sparse kernel SVMs via cutting-plane training. Machine Learning, 2009.
[40]
Thorsten Joachims, Thomas Finley, and Chun-Nam John Yu. Cutting-plane training of structural SVMs. Machine Learning, 77(1):27-59, 2009.
[41]
Michael J. Kearns and Umesh V. Vazirani. An Introduction to Computational Learning Theory. MIT Press Cambridge, MA, USA, 1994.
[42]
Sathiya Keerthi and Dennis DeCoste. A modified finite Newton method for fast solution of large scale linear SVMs. Journal of Machine Learning Research, 6:341-361, 2005.
[43]
Sathiya Keerthi, Olivier Chapelle, and Dennis DeCoste. Building support vector machines with reduced classifier complexity. Journal of Machine Learning Research, 7:1493-1515, 2006.
[44]
Stefan Knerr, Leon Personnaz, and Gerard Dreyfus. Single-layer learning revisited: A stepwise procedure for building and training a neural network. Optimization Methods and Software, 1: 23-34, 1990.
[45]
Robert Krauthgamer and James R. Lee. Navigating nets: simple algorithms for proximity search. In Fifteenth Annual ACM-SIAM Symposium on Discrete algorithms (SODA 04), pages 798-807, Philadelphia, PA, USA, 2004. Society for Industrial and Applied Mathematics.
[46]
Ulrich H.-G. Kressel. Pairwise classification and support vector machines. Advances in Kernel Methods: Support Vector Learning, pages 255-268, 1999.
[47]
Yuh-jye Lee and Olvi L. Mangasarian. RSVM: Reduced support vector machines. In First SIAM International Conference on Data Mining, 2001.
[48]
Chih-Jen Lin, Ruby C. Weng, and S. Sathiya Keerthi. Trust region Newton methods for large-scale logistic regression. In Twenty-fourth International Conference on Machine learning (ICML 07), pages 561-568, New York, NY, USA, 2007. ACM.
[49]
Gaëlle Loosli and Stéphane Canu. Comments on the "Core vector machines: Fast SVM training on very large data sets". Journal of Machine Learning Research, 8:291-301, 2007.
[50]
Olvi L. Mangasarian. A finite Newton method for classification. Optimization Methods and Software, 17(5):913-929, 2002.
[51]
Mario Marchand and John Shawe-Taylor. The set covering machine. The Journal of Machine Learning Research, 3:723-746, 2003.
[52]
Donald Michie, David J. Spiegelhalter, Charles C. Taylor, and John Campbell, editors. Machine Learning, Neural and Statistical Classification. Ellis Horwood, Upper Saddle River, NJ, USA, 1994. ISBN 0-13-106360-X.
[53]
Paul Nemenyi. Distribution-Free Multiple Comparisons. PhD thesis, Princeton, 1963.
[54]
John C. Platt, Nello Cristianini, and John Shawe-Taylor. Large margin DAGs for multiclass classification. Advances in Neural Information Processing Systems, 12(3):547-553, 2000.
[55]
Bernard Schölkopf and Alexander J. Smola. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press Cambridge, MA, USA, 2002.
[56]
Nicola Segata. FaLKM-lib v1.0: A library for fast local kernel machines. Technical Report DISI- 09-025, id 1613, DISI, University of Trento, Italy, 2009. Software available at http://disi. unitn.it/~segata/FaLKM-lib.
[57]
Nicola Segata and Enrico Blanzieri. Empirical assessment of classification accuracy of Local SVM. In Eighteenth Annual Belgian-Dutch Conference on Machine Learning (Benelearn 2009), pages 47-55, 2009a.
[58]
Nicola Segata and Enrico Blanzieri. Operators for transforming kernels into quasi-local kernels that improve SVM accuracy. Technical Report DISI-09-042, id 1652, Tech. rep., DISI, University of Trento, 2009b.
[59]
Nicola Segata and Enrico Blanzieri. Fast local support vector machines for large datasets. In International Conference on Machine Learning and Data Mining (MLDM 2009), volume 5632 of Lecture Notes in Computer Science. Springer, 2009c.
[60]
Nicola Segata, Enrico Blanzieri, and Pádraig Cunningham. A scalable noise reduction technique for large case-based systems. In L Ginty and D.C Wilson, editors, Case-Based Reasoning Research and Development: 8th International Conference on Case-Based Reasoning (ICCBR09), volume 09 of Lecture Notes in Artificial Intelligence, pages 755-758. Springer, 2009a.
[61]
Nicola Segata, Enrico Blanzieri, Sarah Jane Delany, and Pádraig Cunningham. Noise reduction for instance-based learning with a local maximal margin approach. Journal of Intelligent Information Systems, 2009b. In Press.
[62]
Shai Shalev-Shwartz, Yoram Singer, and Nathan Srebro. Pegasos: Primal estimated sub-gradient solver for SVM. In Twenty-fourth International Conference on Machine learning (ICML 07), pages 807-814, New York, NY, USA, 2007. ACM.
[63]
Alexander J. Smola, SVN Vishwanathan, and Quoc V. Le. Bundle methods for machine learning. Advances in Neural Information Processing Systems, 20:1377-1384, 2008.
[64]
Michael E. Thompson. NDCC: Normally distributed clustered datasets on cubes, 2006. www.cs.wisc.edu/dmi/svm/ndcc/.
[65]
Ivor W. Tsang, James T. Kwok, and Pak-Ming Cheung. Core vector machines: Fast SVM training on very large data sets. The Journal of Machine Learning Research, 6:363-392, 2005.
[66]
Ivor W. Tsang, Andras Kocsor, and James T. Kwok. Simpler core vector machines with enclosing balls. In Twenty-fourth International Conference on Machine Learning (ICML 07), pages 911- 918, New York, NY, USA, 2007. ACM.
[67]
Andrew V. Uzilov, Joshua M. Keegan, and David H. Mathews. Detection of non-coding rnas on the basis of predicted secondary structure formation free energy change. BMC Bioinformatics, 7(1): 173, 2006.
[68]
Vladimir N. Vapnik. The Nature of Statistical Learning Theory. Springer, 2000.
[69]
Vladimir N. Vapnik and Léon Bottou. Local algorithms for pattern recognition and dependencies estimation. Neural Computation, 5(6):893-909, 1993.
[70]
Jigang Wang, Predrag Neskovic, and N. Leon Cooper. A minimum sphere covering approach to pattern classification. International Conference on Pattern Recognition, 3:433-436, 2006.
[71]
Xun-Kai Wei and Ying-Hong Li. Linear programming minimum sphere set covering for extreme learning machines. Neurocomputing, 71(4-6):570-575, 2008.
[72]
Frank Wilcoxon. Individual comparisons by ranking methods. Biometrics Bulletin, 1(6):80-83, 1945.
[73]
Tao Yang and Vojislav Kecman. Adaptive local hyperplane classification. Neurocomputing, 71 (13-15):3001-3004, 2008.
[74]
Tao Yang and Vojislav Kecman. Adaptive local hyperplane algorithm for learning small medical data sets. Expert Systems, 26(4):355-359, 2009.
[75]
Tao Yang and Vojislav Kecman. Face recognition with adaptive local hyperplane algorithm. Pattern Analysis & Applications, 13(1):79-83, 2010. ISSN 1433-7541.
[76]
Alan L. Yuille and Anand Rangarajan. The concave-convex procedure. Neural Computation, 15(4): 915-936, 2003.
[77]
Alon Zakai and Ya'acov Ritov. Consistency and localizability. Journal of Machine Learning Research, 10:827-856, 2009.
[78]
Luca Zanni, Thomas Serafini, and Gaetano Zanghirati. Parallel software for training large scale support vector machines on multiprocessor systems. Journal of Machine Learning Research, 7: 1467-1492, 2006.
[79]
Hao Zhang, Alexander C. Berg, Michael Maire, and Jitendra Malik. SVM-KNN: Discriminative nearest neighbor classification for visual category recognition. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2:2126-2136, 2006.

Cited By

View all
  • (2023)Cover Trees Revisited: Exploiting Unused Distance and Direction InformationIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.323178135:11(11231-11245)Online publication date: 1-Nov-2023
  • (2019)LGND: a new method for multi-class novelty detectionNeural Computing and Applications10.1007/s00521-017-3270-731:8(3339-3355)Online publication date: 1-Aug-2019
  • (2018)CROification: Accurate Kernel Classification with the Efficiency of Sparse Linear SVMIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2017.278531341:1(34-48)Online publication date: 3-Dec-2018
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image The Journal of Machine Learning Research
The Journal of Machine Learning Research  Volume 11, Issue
3/1/2010
3637 pages
ISSN:1532-4435
EISSN:1533-7928
Issue’s Table of Contents

Publisher

JMLR.org

Publication History

Published: 01 August 2010
Published in JMLR Volume 11

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)48
  • Downloads (Last 6 weeks)8
Reflects downloads up to 23 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Cover Trees Revisited: Exploiting Unused Distance and Direction InformationIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.323178135:11(11231-11245)Online publication date: 1-Nov-2023
  • (2019)LGND: a new method for multi-class novelty detectionNeural Computing and Applications10.1007/s00521-017-3270-731:8(3339-3355)Online publication date: 1-Aug-2019
  • (2018)CROification: Accurate Kernel Classification with the Efficiency of Sparse Linear SVMIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2017.278531341:1(34-48)Online publication date: 3-Dec-2018
  • (2018)Cluster Validation Measures for Label Noise Filtering2018 International Conference on Intelligent Systems (IS)10.1109/IS.2018.8710495(109-116)Online publication date: 25-Sep-2018
  • (2017)Local learning regularization networks for localized regressionNeural Computing and Applications10.1007/s00521-016-2569-028:6(1309-1328)Online publication date: 1-Jun-2017
  • (2016)Individualized learning for improving kernel Fisher discriminant analysisPattern Recognition10.1016/j.patcog.2016.03.02958:C(100-109)Online publication date: 1-Oct-2016
  • (2016)Parallel Learning of Local SVM Algorithms for Classifying Large DatasetsLNCS Transactions on Large-Scale Data- and Knowledge-Centered Systems XXXI - Volume 1014010.1007/978-3-662-54173-9_4(67-93)Online publication date: 1-Oct-2016
  • (2015)Trading Interpretability for AccuracyProceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining10.1145/2783258.2783407(1245-1254)Online publication date: 10-Aug-2015
  • (2015)SMiLerProceedings of the 2015 ACM SIGMOD International Conference on Management of Data10.1145/2723372.2749429(1871-1886)Online publication date: 27-May-2015
  • (2015)Using Local Rules in Random Forests of Decision TreesProceedings of the Second International Conference on Future Data and Security Engineering - Volume 944610.1007/978-3-319-26135-5_3(32-45)Online publication date: 23-Nov-2015
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media