article

Free access

Fast and Scalable Local Kernel Machines

Authors:

Enrico BlanzieriAuthors Info & Claims

The Journal of Machine Learning Research, Volume 11

Pages 1883 - 1926

Published: 01 August 2010 Publication History

Abstract

A computationally efficient approach to local learning with kernel methods is presented. The Fast Local Kernel Support Vector Machine (FaLK-SVM) trains a set of local SVMs on redundant neighbourhoods in the training set and an appropriate model for each query point is selected at testing time according to a proximity strategy. Supported by a recent result by Zakai and Ritov (2009) relating consistency and localizability, our approach achieves high classification accuracies by dividing the separation function in local optimisation problems that can be handled very efficiently from the computational viewpoint. The introduction of a fast local model selection further speeds-up the learning process. Learning and complexity bounds are derived for FaLK-SVM, and the empirical evaluation of the approach (with data sets up to 3 million points) showed that it is much faster and more accurate and scalable than state-of-the-art accurate and approximated SVM solvers at least for non high-dimensional data sets. More generally, we show that locality can be an important factor to sensibly speed-up learning approaches and kernel methods, differently from other recent techniques that tend to dismiss local information in order to improve scalability.

References

[1]

David W. Aha. Lazy Learning. Kluwer Academic Publishers Norwell, MA, USA, 1997.

Digital Library

[2]

Erin L. Allwein, Robert E. Schapire, and Yoram Singer. Reducing multiclass to binary : A unifying approach for margin classifiers. Journal of Machine Learning Research, 1:113-141, 2000.

Digital Library

[3]

Arthur Asuncion and David Newman. UCI machine learning repository, 2007. URL http://www. ics.uci.edu/~mlearn/MLRepository.html.

[4]

Christopher G. Atkeson, Andrew W. Moore, and Stefan Schaal. Locally weighted learning. Artificial Intelligence Review, 11(1-5):11-73, 1997.

Digital Library

[5]

Yoshua Bengio, Olivier Delalleau, and Nicolas Le Roux. The curse of highly variable functions for local kernel machines. In Advances in Neural Information Processing Systems, volume 18, 2005.

[6]

Alina Beygelzimer, Sham Kakade, and John Langford. Cover trees for nearest neighbor. In Twenty-third International Conference on Machine Learning (ICML 06), pages 97-104, New York, NY, USA, 2006. ACM.

Digital Library

[7]

Jock A Blackard and Denis J. Dean. Comparative accuracies of artificial neural networks and discriminant analysis in predicting forest cover types from cartographic variables. Computers and Electronics in Agriculture, 24:131-151, 1999.

[8]

Enrico Blanzieri and Anton Bryl. Evaluation of the highest probability SVM nearest neighbor classifier with variable relative error cost. In CEAS 2007, Mountain View, California, 2007.

[9]

Enrico Blanzieri and Farid Melgani. An adaptive SVM nearest neighbor classifier for remotely sensed imagery. In IEEE International Conference on Geoscience and Remote Sensing Symposium (IGARSS 06), pages 3931-3934, 2006.

[10]

Enrico Blanzieri and Farid Melgani. Nearest neighbor classification of remote sensing images with the maximal margin principle. IEEE Transactions on Geoscience and Remote Sensing, 46(6): 1804-1811, 2008.

[11]

Antoine Bordes and Léon Bottou. The Huller: A simple and efficient online SVM. In Machine Learning: ECML 2005, Lecture Notes in Artificial Intelligence, LNAI 3720, pages 505-512. Springer Verlag, 2005.

Digital Library

[12]

Antoine Bordes, Seyda Ertekin, Jason Weston, and Léon Bottou. Fast kernel classifiers with online and active learning. Journal of Machine Learning Research, 6:1579-1619, 2005.

Digital Library

[13]

Antoine Bordes, Léon Bottou, and Patrick Gallinari. SGD-QN: Careful quasi-Newton stochastic gradient descent. Journal of Machine Learning Research, 10:1737-1754, July 2009.

Digital Library

[14]

Léon Bottou and Vladimir N. Vapnik. Local learning algorithms. Neural Computation, 4(6):888- 900, 1992.

Digital Library

[15]

Léon Bottou, Corinna Cortes, John S. Denker, Harris Drucker, Isabelle Guyon, Lawrence D. Jackel, Yann Le Cun, Urs A. Muller, Eduard Säckinger, Patrice Simard, and Vladimir Vapnik. Comparison of classifier methods: A case study in handwritten digit recognition. In Twelveth IAPR International Conference on Pattern Recognition, Conference B: Computer Vision & Image Processing, volume 2, pages 77-82. IEEE, 1994.

[16]

Leo Breiman. Bagging predictors. Machine Learning, 24(2):123-140, August 1996.

[17]

David Broomhead and David Lowe. Multivariable functional interpolation and adaptive networks. Complex Systems, 2:321-355, 1988.

[18]

Chih-Chung Chang and Chih-Jen Lin. LIBSVM: A Library for Support Vector Machines, 2001. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.

[19]

Kai-Wei Chang, Cho-Jui Hsieh, and Chih-Jen Lin. Coordinate descent method for large-scale l2- loss linear support vector machines. Journal of Machine Learning Research, 9:1369-1398, 2008.

Digital Library

[20]

Qun Chang, Qingcai Chen, and Xiaolong Wang. Scaling Gaussian RBF kernel width to improve SVM classification. International Conference on Neural Networks and Brain, 2005. (ICNN&B 05), 1:19-22, 2005.

[21]

Long Chen. New analysis of the sphere covering problems and optimal polytope approximation of convex bodies. Journal of Approximation Theory, 133(1):134, 2005.

Digital Library

[22]

Haibin Cheng, Pang-Ning Tan, and Rong Jin. Localized support vector machine and its efficient algorithm. SIAM International Conference on Data Mining, 2007.

[23]

Vasek Chvatal. A greedy heuristic for the set-covering problem. Mathematics of Operations Research, pages 233-235, 1979.

Digital Library

[24]

Kenneth L. Clarkson. Nearest neighbor queries in metric spaces. In Twenty-ninth Annual ACM Symposium on Theory of computing (STOC 97), pages 609-617, New York, NY, USA, 1997. ACM.

Digital Library

[25]

Michael Collins, Amir Globerson, Terry Koo, Xavier Carreras, and Peter L. Bartlett. Exponentiated gradient algorithms for conditional random fields and max-margin markov networks. Journal of Machine Learning Research, 9:1775-1822, 2008.

Digital Library

[26]

Ronan Collobert, Fabian Sinz, Jason Weston, and Léon Bottou. Trading convexity for scalability. In Twenty-third International Conference on Machine Learning (ICML 06), pages 201-208, New York, NY, USA, 2006. ACM.

Digital Library

[27]

Corinna Cortes and Vladimir Vapnik. Support-vector networks. Machine Learning, 20(3):273-297, 1995.

[28]

Janez Dem¿ar. Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research, 7:1-30, 2006.

Digital Library

[29]

Jian-xiong Dong. Fast SVM training algorithm with decomposition on very large data sets. IEEE Transaction Pattern Analysis and Machine Intelligence, 27(4):603-618, 2005. Senior Member-Krzyzak, Adam and Fellow-Suen, Ching Y.

Digital Library

[30]

Rong-En Fan, Pai-Hsuen Chen, and Chih-Jen Lin. Working set selection using second order information for training support vector machines. The Journal of Machine Learning Research, 6: 1889-1918, 2005.

Digital Library

[31]

Rong-En Fan, Kai-Wei Chang, Cho-Jui Hsieh, Xiang-Rui Wang, and Chih-Jen Lin. LIBLINEAR: A library for large linear classification. The Journal of Machine Learning Research, 9:1871-1874, 2008.

Digital Library

[32]

Milton Friedman. A comparison of alternative tests of significance for the problem of m rankings. The Annals of Mathematical Statistics, 11(1):86-92, 1940.

[33]

Michael R. Garey and David S. Johnson. Computers and Intractability: A Guide to the Theory of NP-Completeness. WH Freeman & Co. New York, NY, USA, 1979.

Digital Library

[34]

Seth Hettich and Stephen D. Bay. The UCI KDD archive, 1999. URL http://kdd.ics.uci.edu.

[35]

Cho-Jui Hsieh, Kai-Wei Chang, Chih-Jen Lin, S. Sathiya Keerthi, and S. Sundararajan. A dual coordinate descent method for large-scale linear SVM. In Twenty-fifth International Conference on Machine Learning (ICML 08), pages 408-415, New York, NY, USA, 2008. ACM.

Digital Library

[36]

Chih-Wei Hsu and Chih-Jen Lin. A comparison of methods for multiclass support vector machines. IEEE Transactions on Neural Networks, 13(2):415-425, 2002.

Digital Library

[37]

Thorsten Joachims. Making large-scale support vector machine learning practical. Advances in kernel methods: support vector learning, pages 169-184, 1999.

Digital Library

[38]

Thorsten Joachims. Training linear SVMs in linear time. In Twelveth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 217-226. ACM New York, NY, USA, 2006.

Digital Library

[39]

Thorsten Joachims and Chun-Nam Yu. Sparse kernel SVMs via cutting-plane training. Machine Learning, 2009.

Digital Library

[40]

Thorsten Joachims, Thomas Finley, and Chun-Nam John Yu. Cutting-plane training of structural SVMs. Machine Learning, 77(1):27-59, 2009.

Digital Library

[41]

Michael J. Kearns and Umesh V. Vazirani. An Introduction to Computational Learning Theory. MIT Press Cambridge, MA, USA, 1994.

Digital Library

[42]

Sathiya Keerthi and Dennis DeCoste. A modified finite Newton method for fast solution of large scale linear SVMs. Journal of Machine Learning Research, 6:341-361, 2005.

Digital Library

[43]

Sathiya Keerthi, Olivier Chapelle, and Dennis DeCoste. Building support vector machines with reduced classifier complexity. Journal of Machine Learning Research, 7:1493-1515, 2006.

Digital Library

[44]

Stefan Knerr, Leon Personnaz, and Gerard Dreyfus. Single-layer learning revisited: A stepwise procedure for building and training a neural network. Optimization Methods and Software, 1: 23-34, 1990.

[45]

Robert Krauthgamer and James R. Lee. Navigating nets: simple algorithms for proximity search. In Fifteenth Annual ACM-SIAM Symposium on Discrete algorithms (SODA 04), pages 798-807, Philadelphia, PA, USA, 2004. Society for Industrial and Applied Mathematics.

Digital Library

[46]

Ulrich H.-G. Kressel. Pairwise classification and support vector machines. Advances in Kernel Methods: Support Vector Learning, pages 255-268, 1999.

Digital Library

[47]

Yuh-jye Lee and Olvi L. Mangasarian. RSVM: Reduced support vector machines. In First SIAM International Conference on Data Mining, 2001.

[48]

Chih-Jen Lin, Ruby C. Weng, and S. Sathiya Keerthi. Trust region Newton methods for large-scale logistic regression. In Twenty-fourth International Conference on Machine learning (ICML 07), pages 561-568, New York, NY, USA, 2007. ACM.

Digital Library

[49]

Gaëlle Loosli and Stéphane Canu. Comments on the "Core vector machines: Fast SVM training on very large data sets". Journal of Machine Learning Research, 8:291-301, 2007.

Digital Library

[50]

Olvi L. Mangasarian. A finite Newton method for classification. Optimization Methods and Software, 17(5):913-929, 2002.

[51]

Mario Marchand and John Shawe-Taylor. The set covering machine. The Journal of Machine Learning Research, 3:723-746, 2003.

Digital Library

[52]

Donald Michie, David J. Spiegelhalter, Charles C. Taylor, and John Campbell, editors. Machine Learning, Neural and Statistical Classification. Ellis Horwood, Upper Saddle River, NJ, USA, 1994. ISBN 0-13-106360-X.

Digital Library

[53]

Paul Nemenyi. Distribution-Free Multiple Comparisons. PhD thesis, Princeton, 1963.

[54]

John C. Platt, Nello Cristianini, and John Shawe-Taylor. Large margin DAGs for multiclass classification. Advances in Neural Information Processing Systems, 12(3):547-553, 2000.

[55]

Bernard Schölkopf and Alexander J. Smola. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press Cambridge, MA, USA, 2002.

Digital Library

[56]

Nicola Segata. FaLKM-lib v1.0: A library for fast local kernel machines. Technical Report DISI- 09-025, id 1613, DISI, University of Trento, Italy, 2009. Software available at http://disi. unitn.it/~segata/FaLKM-lib.

[57]

Nicola Segata and Enrico Blanzieri. Empirical assessment of classification accuracy of Local SVM. In Eighteenth Annual Belgian-Dutch Conference on Machine Learning (Benelearn 2009), pages 47-55, 2009a.

[58]

Nicola Segata and Enrico Blanzieri. Operators for transforming kernels into quasi-local kernels that improve SVM accuracy. Technical Report DISI-09-042, id 1652, Tech. rep., DISI, University of Trento, 2009b.

[59]

Nicola Segata and Enrico Blanzieri. Fast local support vector machines for large datasets. In International Conference on Machine Learning and Data Mining (MLDM 2009), volume 5632 of Lecture Notes in Computer Science. Springer, 2009c.

Digital Library

[60]

Nicola Segata, Enrico Blanzieri, and Pádraig Cunningham. A scalable noise reduction technique for large case-based systems. In L Ginty and D.C Wilson, editors, Case-Based Reasoning Research and Development: 8th International Conference on Case-Based Reasoning (ICCBR09), volume 09 of Lecture Notes in Artificial Intelligence, pages 755-758. Springer, 2009a.

Digital Library

[61]

Nicola Segata, Enrico Blanzieri, Sarah Jane Delany, and Pádraig Cunningham. Noise reduction for instance-based learning with a local maximal margin approach. Journal of Intelligent Information Systems, 2009b. In Press.

Digital Library

[62]

Shai Shalev-Shwartz, Yoram Singer, and Nathan Srebro. Pegasos: Primal estimated sub-gradient solver for SVM. In Twenty-fourth International Conference on Machine learning (ICML 07), pages 807-814, New York, NY, USA, 2007. ACM.

Digital Library

[63]

Alexander J. Smola, SVN Vishwanathan, and Quoc V. Le. Bundle methods for machine learning. Advances in Neural Information Processing Systems, 20:1377-1384, 2008.

[64]

Michael E. Thompson. NDCC: Normally distributed clustered datasets on cubes, 2006. www.cs.wisc.edu/dmi/svm/ndcc/.

[65]

Ivor W. Tsang, James T. Kwok, and Pak-Ming Cheung. Core vector machines: Fast SVM training on very large data sets. The Journal of Machine Learning Research, 6:363-392, 2005.

Digital Library

[66]

Ivor W. Tsang, Andras Kocsor, and James T. Kwok. Simpler core vector machines with enclosing balls. In Twenty-fourth International Conference on Machine Learning (ICML 07), pages 911- 918, New York, NY, USA, 2007. ACM.

Digital Library

[67]

Andrew V. Uzilov, Joshua M. Keegan, and David H. Mathews. Detection of non-coding rnas on the basis of predicted secondary structure formation free energy change. BMC Bioinformatics, 7(1): 173, 2006.

[68]

Vladimir N. Vapnik. The Nature of Statistical Learning Theory. Springer, 2000.

Digital Library

[69]

Vladimir N. Vapnik and Léon Bottou. Local algorithms for pattern recognition and dependencies estimation. Neural Computation, 5(6):893-909, 1993.

Digital Library

[70]

Jigang Wang, Predrag Neskovic, and N. Leon Cooper. A minimum sphere covering approach to pattern classification. International Conference on Pattern Recognition, 3:433-436, 2006.

Digital Library

[71]

Xun-Kai Wei and Ying-Hong Li. Linear programming minimum sphere set covering for extreme learning machines. Neurocomputing, 71(4-6):570-575, 2008.

Digital Library

[72]

Frank Wilcoxon. Individual comparisons by ranking methods. Biometrics Bulletin, 1(6):80-83, 1945.

[73]

Tao Yang and Vojislav Kecman. Adaptive local hyperplane classification. Neurocomputing, 71 (13-15):3001-3004, 2008.

Digital Library

[74]

Tao Yang and Vojislav Kecman. Adaptive local hyperplane algorithm for learning small medical data sets. Expert Systems, 26(4):355-359, 2009.

[75]

Tao Yang and Vojislav Kecman. Face recognition with adaptive local hyperplane algorithm. Pattern Analysis & Applications, 13(1):79-83, 2010. ISSN 1433-7541.

Digital Library

[76]

Alan L. Yuille and Anand Rangarajan. The concave-convex procedure. Neural Computation, 15(4): 915-936, 2003.

Digital Library

[77]

Alon Zakai and Ya'acov Ritov. Consistency and localizability. Journal of Machine Learning Research, 10:827-856, 2009.

Digital Library

[78]

Luca Zanni, Thomas Serafini, and Gaetano Zanghirati. Parallel software for training large scale support vector machines on multiprocessor systems. Journal of Machine Learning Research, 7: 1467-1492, 2006.

Digital Library

[79]

Hao Zhang, Alexander C. Berg, Michael Maire, and Jitendra Malik. SVM-KNN: Discriminative nearest neighbor classification for visual category recognition. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2:2126-2136, 2006.

Digital Library

Cited By

Wang ZNie MZhao KQuan ZYao B(2023)Cover Trees Revisited: Exploiting Unused Distance and Direction InformationIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.323178135:11(11231-11245)Online publication date: 1-Nov-2023
https://dl.acm.org/doi/10.1109/TKDE.2022.3231781
Tang JTian YLiu X(2019)LGND: a new method for multi-class novelty detectionNeural Computing and Applications10.1007/s00521-017-3270-731:8(3339-3355)Online publication date: 1-Aug-2019
https://dl.acm.org/doi/10.1007/s00521-017-3270-7
Kafai MEshghi K(2018)CROification: Accurate Kernel Classification with the Efficiency of Sparse Linear SVMIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2017.278531341:1(34-48)Online publication date: 3-Dec-2018
https://dl.acm.org/doi/10.1109/TPAMI.2017.2785313
Show More Cited By

Index Terms

Fast and Scalable Local Kernel Machines

Recommendations

Local normalized linear summation kernel for fast and robust recognition

Kernel-based methods are effective for object detection and recognition. However, the computational cost when using kernel functions is high, except when using linear kernels. To realize fast and robust recognition, we apply normalized linear kernels to ...
Deep kernel learning in core vector machines

In machine learning literature, deep learning methods have been moving toward greater heights by giving due importance in both data representation and classification methods. The recently developed multilayered arc-cosine kernel leverages the ...
Fast support vector classifier with generalization-memorization kernel
Abstract
Recently, Vladimir Vapnik et al. proposed a generalization-memorization kernel for SVC, which significantly improves the memorization and generalization performance of SVC. However, it requires more memory on large data sets. In this paper, we ...

Comments

Information & Contributors

Information

Published In

cover image The Journal of Machine Learning Research

The Journal of Machine Learning Research Volume 11, Issue

3/1/2010

3637 pages

ISSN:1532-4435

EISSN:1533-7928

Issue’s Table of Contents

Publisher

JMLR.org

Publication History

Published: 01 August 2010

Published in JMLR Volume 11

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

19
Total Citations
View Citations
388
Total Downloads

Downloads (Last 12 months)48
Downloads (Last 6 weeks)8

Reflects downloads up to 23 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Wang ZNie MZhao KQuan ZYao B(2023)Cover Trees Revisited: Exploiting Unused Distance and Direction InformationIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.323178135:11(11231-11245)Online publication date: 1-Nov-2023
https://dl.acm.org/doi/10.1109/TKDE.2022.3231781
Tang JTian YLiu X(2019)LGND: a new method for multi-class novelty detectionNeural Computing and Applications10.1007/s00521-017-3270-731:8(3339-3355)Online publication date: 1-Aug-2019
https://dl.acm.org/doi/10.1007/s00521-017-3270-7
Kafai MEshghi K(2018)CROification: Accurate Kernel Classification with the Efficiency of Sparse Linear SVMIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2017.278531341:1(34-48)Online publication date: 3-Dec-2018
https://dl.acm.org/doi/10.1109/TPAMI.2017.2785313
Boeva VLundberg LAngelova MKohstall J(2018)Cluster Validation Measures for Label Noise Filtering2018 International Conference on Intelligent Systems (IS)10.1109/IS.2018.8710495(109-116)Online publication date: 25-Sep-2018
https://dl.acm.org/doi/10.1109/IS.2018.8710495
Kokkinos YMargaritis K(2017)Local learning regularization networks for localized regressionNeural Computing and Applications10.1007/s00521-016-2569-028:6(1309-1328)Online publication date: 1-Jun-2017
https://dl.acm.org/doi/10.1007/s00521-016-2569-0
Fan ZXu YNi MFang XZhang D(2016)Individualized learning for improving kernel Fisher discriminant analysisPattern Recognition10.1016/j.patcog.2016.03.02958:C(100-109)Online publication date: 1-Oct-2016
https://dl.acm.org/doi/10.1016/j.patcog.2016.03.029
Do TPoulet F(2016)Parallel Learning of Local SVM Algorithms for Classifying Large DatasetsLNCS Transactions on Large-Scale Data- and Knowledge-Centered Systems XXXI - Volume 1014010.1007/978-3-662-54173-9_4(67-93)Online publication date: 1-Oct-2016
https://dl.acm.org/doi/10.1007/978-3-662-54173-9_4
Wang JFujimaki RMotohashi YCao LZhang CJoachims TWebb GMargineantu DWilliams G(2015)Trading Interpretability for AccuracyProceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining10.1145/2783258.2783407(1245-1254)Online publication date: 10-Aug-2015
https://dl.acm.org/doi/10.1145/2783258.2783407
Zhou JTung ASellis TDavidson SIves Z(2015)SMiLerProceedings of the 2015 ACM SIGMOD International Conference on Management of Data10.1145/2723372.2749429(1871-1886)Online publication date: 27-May-2015
https://dl.acm.org/doi/10.1145/2723372.2749429
Do T(2015)Using Local Rules in Random Forests of Decision TreesProceedings of the Second International Conference on Future Data and Security Engineering - Volume 944610.1007/978-3-319-26135-5_3(32-45)Online publication date: 23-Nov-2015
https://dl.acm.org/doi/10.1007/978-3-319-26135-5_3
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables

View Issue’s Table of Contents