×

Distributed learning with indefinite kernels. (English) Zbl 1440.68238

Summary: We investigate the distributed learning with coefficient-based regularization scheme under the framework of kernel regression methods. Compared with the classical kernel ridge regression (KRR), the algorithm under consideration does not require the kernel function to be positive semi-definite and hence provides a simple paradigm for designing indefinite kernel methods. The distributed learning approach partitions a massive data set into several disjoint data subsets, and then produces a global estimator by taking an average of the local estimator on each data subset. Easy exercisable partitions and performing algorithm on each subset in parallel lead to a substantial reduction in computation time versus the standard approach of performing the original algorithm on the entire samples. We establish the first mini-max optimal rates of convergence for distributed coefficient-based regularization scheme with indefinite kernels. We thus demonstrate that compared with distributed KRR, the concerned algorithm is more flexible and effective in regression problem for large-scale data sets.

MSC:

68T05 Learning and adaptive systems in artificial intelligence
62J02 General nonlinear regression
68W15 Distributed algorithms
Full Text: DOI

References:

[1] Aronszajn, N., Theory of reproducing kernels, Trans. Amer. Math. Soc.68 (1950) 337-404. · Zbl 0037.20701
[2] Bauer, F., Pereverzev, S. and Rosasco, L., On regularization algorithms in learning theory, J. Complexity23 (2007) 52-72. · Zbl 1109.68088
[3] Blanchard, G. and Krämer, N., Optimal learning rates for kernel conjugate gradient regression, in Advances in Neural Information Processing Systems 23 (NIPS, 2010), 9 pp. · Zbl 1349.62125
[4] Blanchard, G. and Krämer, N., Convergence rates of kernel conjugate gradient for random design regression, Anal. Appl.14 (2016) 763-794. · Zbl 1349.62125
[5] Bognár, J., Indefinite Inner Product Spaces (Springer, New York, 1974). · Zbl 0277.47024
[6] Caponnetto, A. and De Vito, E., Optimal rates for the regularized least squares algorithm, Found. Comp. Math.7 (2007) 331-368. · Zbl 1129.68058
[7] Chang, X., Lin, S. and Zhou, D. X., Distributed semi-supervised learning with kernel ridge regression, J. Mach. Learn. Res.18 (2017) 1-22. · Zbl 1431.68106
[8] Conway, J. B., A Course in Operator Theory (American Mathematical Society, Providence, RI, 2000). · Zbl 0936.47001
[9] Cucker, F. and Zhou, D. X., Learning Theory: An Approximation Theory Viewpoint (Cambridge University Press, Cambridge, 2007). · Zbl 1274.41001
[10] Feng, Y., Lv, S. G., Hang, H. and Suykens, J. A. K., Kernelized elastic net regularization: Generalization bounds, and sparse recovery, Neural Comput.28 (2016) 525-562. · Zbl 1474.62124
[11] Z. C. Guo, S. B. Lin and L. Shi, Distributed learning with multi-penalty regularization, Appl. Comput. Harmon. Anal., doi.org/10.1016/j.acha.2017.06.001. · Zbl 1431.68107
[12] Guo, Z. C., Lin, S. B. and Zhou, D. X., Learning theory for distributed spectral algorithm, Inverse Probl.33 (2017) 074009. · Zbl 1372.65162
[13] Guo, Z. C. and Shi, L., Learning with coefficient-based regularization and \(\ell^1\)-penalty, Adv. Comput. Math.39 (2013) 493-510. · Zbl 1296.68128
[14] Z. C. Guo and L. Shi, Optimal rates for coefficient-based regularized regression, Appl. Comput. Harmon. Anal., doi.org/10.1016/j.acha.2017.11.005. · Zbl 1467.62062
[15] Guo, Z. C., Shi, L. and Wu, Q., Learning theory of distributed regression with bias corrected regularization kernel network, J. Mach. Learn. Res.18 (2017) 4237-4261. · Zbl 1435.68260
[16] Guo, Z. C., Xiang, D. H., Guo, X. and Zhou, D. X., Thresholded spectral algorithms for sparse approximations, Anal. Appl.15 (2017) 433-455. · Zbl 1409.68232
[17] J. Lin and V. Cevher, Optimal convergence for distributed learning with stochastic gradient methods and spectral-regularization algorithm, arXiv:1801.07226. · Zbl 1525.68118
[18] Lin, S. B., Guo, X. and Zhou, D. X., Distributed learning with regularized least squares, J. Mach. Learn. Res.18 (2017) 1-31. · Zbl 1435.68273
[19] Lin, S. B., Zeng, J., Fang, J. and Xu, Z., Learning rates of \(\ell^q\)-coefficient regularization learning with gaussian kernel, Neural Comput.26 (2014) 2350-2378. · Zbl 1410.62119
[20] S. Minsker, On some extensions of Bernstein’s inequality for self-adjoint operators, arXiv:1112.5448. · Zbl 1377.60018
[21] Mücke, N. and Blanchard, G., Parallelizing spectral algorithms for kernel learning, J. Mach. Learn. Res.19 (2018) 1-29. · Zbl 1461.68189
[22] Ong, C. S., Smola, A. J. and Williamson, R. C., Learning the kernel with hyperkernels, J. Mach. Learn. Res.6 (2005) 1043-1071. · Zbl 1222.68277
[23] Roth, V., The generalized LASSO, IEEE Trans. Neural Netw.15 (2004) 16-28.
[24] Schleif, F. and Tino, P., Indefinite proximity learning: A review, Neural Comput.27 (2015) 2039-2096. · Zbl 1472.68155
[25] Schölkopf, B. and Smola, A. J., Learing with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond (MIT Press, Cambridge, 2002).
[26] Shao, J., Mathematical Statistics (Springer, New York, 2003). · Zbl 1018.62001
[27] Shen, W. J., Wong, H. S., Xiao, Q. W., Guo, X. and Smale, S., Introduction to the peptide binding problem of computational immunology: New results, Found. Comp. Math.14 (2014) 951-984. · Zbl 1329.92096
[28] Shi, L., Learning theory estimates for coefficient-based regularized regression, Appl. Comput. Harmon. Anal.34 (2013) 252-265. · Zbl 1261.68096
[29] Shi, L., Feng, Y. L. and Zhou, D. X., Concentration estimates for learning with \(\ell^1\)-regularizer and data dependent hypothesis spaces, Appl. Comput. Harmon. Anal.31 (2011) 286-302. · Zbl 1221.68201
[30] Smale, S. and Zhou, D. X., Estimating the approximation error in learning theory, Anal. Appl.1 (2003) 17-41. · Zbl 1079.68089
[31] Smale, S. and Zhou, D. X., Shannon sampling II: Connections to learning theory, Appl. Comput. Harmon. Anal.19 (2005) 285-302. · Zbl 1107.94008
[32] Smale, S. and Zhou, D. X., Learning theory estimates via integral operators and their approximations, Constr. Approx.26 (2007) 153-172. · Zbl 1127.68088
[33] Song, G., Zhang, H. and Hickernell, F. J.. Reproducing kernel Banach spaces with the \(\ell_1\) norm, Appl. Comput. Harmon. Anal.34 (2011) 96-116. · Zbl 1269.46020
[34] Steinwart, I. and Christmann, A., Support Vector Machines (Springer-Verlag, New York, 2008). · Zbl 1203.68171
[35] Steinwart, I., Hush, D. R. and Scovel, C., Optimal rates for regularized least squares regression, in COLT, Montreal, Canada, (2009), pp. 79-93.
[36] Steinwart, I. and Scovel, C., Mercer’s theorem on general domains: On the interaction between measures, kernels and RKHSs, Constr. Approx.35 (2012) 363-417. · Zbl 1252.46018
[37] Sun, H. and Wu, Q., Least square regression with indefinite kernels and coefficient regularization, Appl. Comput. Harmon. Anal.30 (2011) 96-109. · Zbl 1225.65015
[38] Tropp, J. A., An Introduction to matrix conecentation inequalities, Found. Trends Mach. Learn.8 (2015) 1-230. · Zbl 1391.15071
[39] Vapnik, V., Stastical Learning Thoery (Wiley, New York, 1998).
[40] Vito, E. D., Umanità, V. and Villa, S., An extension of Mercer theorem to vector-valued measurable kernels, Appl. Comput. Harmon. Anal.34 (2013) 339-351. · Zbl 1283.46018
[41] Wendland, H., Scattered Data Approximation (Cambridge University Press, Cambridge2005). · Zbl 1075.65021
[42] Wu, Q., Regularization networks with indefinite kernels, J. Approx. Theory166 (2013) 1-18. · Zbl 1295.68141
[43] Wu, Q. and Zhou, D. X., Learning with sample dependent hypothesis space, Comput. Math. Appl.56 (2008) 2896-2907. · Zbl 1165.68388
[44] Yao, Y., Rosasco, L. and Caponnetto, A., Early stopping in gradient descent boosting, Constr. Approx.26 (2007) 289-315. · Zbl 1125.62035
[45] Zhang, Y., Duchi, J. C. and Wainwright, M. J., Divide and conquer kernel ridge regression: A distributed algorithm with minimax optimal rates, J. Mach. Learn. Res.16 (2015) 3299-3340. · Zbl 1351.62142
[46] Zhao, Y., Fan, J. and Shi, L., Learning rates for regularized least squares ranking algorithm, Anal. Appl.15 (2017) 815-836. · Zbl 1420.68182
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.