Document Zbl 1498.68230

A convex parametrization of a new class of universal kernel functions. (English) Zbl 1498.68230

J. Mach. Learn. Res. 21, Paper No. 45, 29 p. (2020).

Summary: The accuracy and complexity of kernel learning algorithms is determined by the set of kernels over which it is able to optimize. An ideal set of kernels should: admit a linear parameterization (tractability); be dense in the set of all kernels (accuracy); and every member should be universal so that the hypothesis space is infinite-dimensional (scalability). Currently, there is no class of kernel that meets all three criteria – e.g. Gaussians are not tractable or accurate; polynomials are not scalable. We propose a new class that meet all three criteria – the Tessellated Kernel (TK) class. Specifically, the TK class: admits a linear parameterization using positive matrices; is dense in all kernels; and every element in the class is universal. This implies that the use of TK kernels for learning the kernel can obviate the need for selecting candidate kernels in algorithms such as SimpleMKL and parameters such as the bandwidth. Numerical testing on soft margin Support Vector Machine (SVM) problems show that algorithms using TK kernels outperform other kernel learning algorithms and neural networks. Furthermore, our results show that when the ratio of the number of training data to features is high, the improvement of TK over MKL increases significantly.

Cited in 2 Documents

MSC:

68T05	Learning and adaptive systems in artificial intelligence
62H30	Classification and discrimination; cluster analysis (statistical aspects)
90C22	Semidefinite programming

Keywords:

kernel functions; multiple kernel learning; semidefinite programming; supervised learning; universal kernels

Software:

Mosek

Cite Review PDF

Full Text: arXiv Link

References:

[1]	F. Alizadeh, J.-P. Haeberly, and M. Overton. Primal-dual interior-point methods for semidefinite programming: convergence rates, stability and numerical results.SIAM · Zbl 0911.65047
[2]	MOSEK ApS.The MOSEK optimization toolbox for MATLAB manual. Version 7.1 (Revision 28)., 2015.
[3]	K. M. Borgwardt, A. Gretton, M. J. Rasch, H. Kriegel, B. Sch¨olkopf, and A. J. Smola. Integrating structured biological data by kernel maximum mean discrepancy.Bioinformatics, 2006.
[4]	A. Caponnetto, C. Micchelli, M. Pontil, and Y. Ying. Universal multi-task kernels.Journal of Machine Learning Research, 2008. · Zbl 1225.68155
[5]	M. Collins and N. Duffy. Convolution kernels for natural language. InAdvances in Neural Information Processing Systems, 2002.
[6]	C. Cortes, M. Mohri, and A. Rostamizadeh. Learning non-linear combinations of kernels. In Advances in Neural Information Processing Systems, 2009. · Zbl 1283.68286
[7]	C. Cortes, M. Mohri, and A. Rostamizadeh. Algorithms for learning kernels based on centered alignment.Journal of Machine Learning Research, 2012. · Zbl 1283.68286
[8]	A. C. Doherty, P. A. Parrilo, and F. M. Spedalieri. Complete family of separability criteria. Physical Review A, 2004.
[9]	E. Eskin, J. Weston, W. Noble, and C. Leslie. Mismatch string kernels for SVM protein classification. InAdvances in Neural Information Processing Systems, 2003.
[10]	K. Gai, G. Chen, and C. S. Zhang. Learning kernels with radiuses of minimum enclosing balls. InAdvances in Neural Information Processing Systems, 2010.
[11]	T. G¨artner, P. Flach, and S. Wrobel. On graph kernels: Hardness results and efficient alternatives. InLearning Theory and Kernel Machines. 2003.
[12]	M. Gasca and T. Sauer. On the history of multivariate polynomial interpolation. In Numerical Analysis: Historical Developments in the 20th Century. 2001. · Zbl 0968.65008
[13]	M. G¨onen and E. Alpaydin. Localized multiple kernel learning. InProceedings of the International Conference on Machine learning, 2008. · Zbl 1175.68308
[14]	M. G¨onen and E. Alpaydın. Multiple kernel learning algorithms.Journal of Machine Learning Research, 2011. · Zbl 1280.68167
[15]	D. Haussler. Convolution kernels on discrete structures. Technical report, University of California in Santa Cruz, 1999.
[16]	A. Jain, S. Vishwanathan, and M. Varma. SPF-GMKL: generalized multiple kernel learning with a million kernels. InProceedings of the ACM International Conference on Knowledge
[17]	G. Lanckriet, N. Cristianini, P. Bartlett, L. El Ghaoui, and M. Jordan. Learning the kernel matrix with semidefinite programming.Journal of Machine Learning Research, 2004. · Zbl 1222.68241
[18]	K. Lang. Learning to tell two spirals apart. InProceedings of the Connectionist Models Summer School, 1988.
[19]	H. Lodhi, C. Saunders, J. Shawe-Taylor, N. Cristianini, and C. Watkins. Text classification using string kernels.Journal of Machine Learning Research, 2002. · Zbl 1013.68176
[20]	C. Micchelli, Y. Xu, and H. Zhang. Universal kernels.Journal of Machine Learning Research, 2006. · Zbl 1222.68266
[21]	C. S. Ong, A. J. Smola, and R. C. Williamson. Learning the kernel with hyperkernels. Journal of Machine Learning Research, 2005. · Zbl 1222.68277
[22]	M. M. Peet, A. Papachristodoulou, and S. Lall. Positive forms and stability of linear time-delay systems.SIAM Journal on Control and Optimization, 2009. · Zbl 1187.34101
[23]	A. Rahimi and B. Recht. Random features for large-scale kernel machines. InAdvances in neural information processing systems, pages 1177-1184, 2008.
[24]	A. Rakotomamonjy, F. R. Bach, S. Canu, and Y. Grandvalet. SimpleMKL.Journal of Machine Learning Research, 2008.
[25]	B. Recht.Convex Modeling with Priors. PhD thesis, Massachusetts Institute of Technology, 2006.
[26]	B. Sch¨olkopf, A. J. Smola, and F. Bach.Learning with kernels: support vector machines, regularization, optimization, and beyond. MIT press, 2002.
[27]	B. Shekhtman. Why piecewise linear functions are dense in C [0, 1].Journal of Approximation Theory, 1982. · Zbl 0601.41033
[28]	S. ´C. Sonnenburg, S. Henschel, C. Widmer, J. Behr, A. Zien, F. de Bona, A. Binder, C. Gehl, V. Franc, et al. The SHOGUN machine learning toolbox.Journal of Machine Learning · Zbl 1242.68003
[29]	N. Subrahmanya and Y. Shin. Sparse multiple kernel learning for signal processing applications.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2010.
[30]	H Sun. Mercer theorem for RKHS on noncompact sets.Journal of Complexity, 2005. · Zbl 1094.46021
[31]	H. Wang, Q. Xiao, and D. Zhou. An approximation theory approach to learning with‘1 regularization.Journal of Approximation Theory, 2013. · Zbl 1283.68308

This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.