×

Coefficient-based regularized distribution regression. (English) Zbl 1539.62117

Summary: In this paper, we consider the coefficient-based regularized distribution regression which aims to regress from probability measures to real-valued responses over a reproducing kernel Hilbert space (RKHS), where the regularization is put on the coefficients and kernels are assumed to be indefinite. The algorithm involves two stages of sampling, the first stage sample consists of distributions and the second stage sample is obtained from these distributions. The asymptotic behavior of the algorithm is comprehensively studied across different regularity ranges of the regression function. Explicit learning rates are derived by using kernel mean embedding and integral operator techniques. We obtain the optimal rates under some mild conditions, which match the one-stage sampled minimax optimal rate. Compared with the kernel methods for distribution regression in existing literature, the algorithm under consideration does not require the kernel to be symmetric or positive semi-definite and hence provides a simple paradigm for designing indefinite kernel methods, which enriches the theme of the distribution regression. To the best of our knowledge, this is the first result for distribution regression with indefinite kernels, and our algorithm can improve the learning performance against saturation effect.

MSC:

62G08 Nonparametric regression and quantile regression
46N30 Applications of functional analysis in probability theory and statistics
65M70 Spectral, collocation and related methods for initial value and initial-boundary value problems involving PDEs
90C30 Nonlinear programming

References:

[1] Balog, M.; Tolstikhin, I.; Schölkopf, B., Differentially private database release via kernel mean embeddings, 414-422
[2] Bauer, F.; Pereverzev, S.; Rosasco, L., On regularization algorithms in learning theory. J. Complexity, 1, 52-72 (2007) · Zbl 1109.68088
[3] Boots, B.; Gretton, A.; Gordon, G. J., Hilbert space embeddings of predictive state representations, 92-101
[4] Caponnetto, A.; De Vito, E., Optimal rates for the regularized least-squares algorithm. Found. Comput. Math., 3, 331-368 (2007) · Zbl 1129.68058
[5] Carratino, L.; Rudi, A.; Rosasco, L., Learning with SGD and random features, 10213-10224
[6] Chatalic, A.; Schellekens, V.; Houssiau, F.; de Montjoye, Y. A.; Jacques, L.; Gribonval, R., Compressive learning with privacy guarantees. Inform. Inference: J. IMA, 1, 251-305 (2022) · Zbl 07517499
[7] Chen, Z.; Zhang, K.; Chan, L.; Schölkopf, B., Causal discovery via reproducing kernel Hilbert space embeddings. Neural Comput., 7, 1484-1517 (2014) · Zbl 1415.68166
[8] Cucker, F.; Zhou, D. X., Learning Theory: An Approximation Theory Viewpoint (2007), Cambridge University Press · Zbl 1274.41001
[9] Dietterich, T. G.; Lathrop, R. H.; Lozano-Pérez, T., Solving the multiple instance problem with axis-parallel rectangles. Artificial Intelligence, 1, 31-71 (1997) · Zbl 1042.68650
[10] Dong, S.; Sun, W., Distributed learning and distribution regression of coefficient regularization. J. Approx. Theory (2021) · Zbl 1479.62023
[11] Dooly, D. R.; Zhang, Q.; Goldman, S. A.; Amar, R. A., Multiple-instance learning of real-valued data. J. Mach. Learn. Res., Dec, 651-678 (2002) · Zbl 1089.68589
[12] Fan, J.; Lv, F.; Shi, L., An RKHS approach to estimate individualized treatment rules based on functional predictors. Math. Found. Comput., 2, 169 (2019) · Zbl 1486.68149
[13] Fang, Z.; Guo, Z.-C.; Zhou, D.-X., Optimal learning rates for distribution regression. J. Complexity (2020) · Zbl 1435.62259
[14] Fukumizu, K.; Gretton, A.; Sun, X.; Schölkopf, B., Kernel measures of conditional dependence
[15] Gretton, A.; Borgwardt, K. M.; Rasch, M. J.; Schölkopf, B.; Smola, A., A kernel two-sample test. J. Mach. Learn. Res., 25, 723-773 (2012) · Zbl 1283.62095
[16] Guo, X.; Li, L.; Wu, Q., Modeling interactive components by coordinate kernel polynomial models. Math. Found. Comput., 4, 263 (2020) · Zbl 1485.68214
[17] Guo, Z.-C.; Christmann, A.; Shi, L., Optimality of robust online learning. Found. Comput. Math. (2023)
[18] Guo, Z.-C.; Lin, S.-B.; Zhou, D.-X., Learning theory of distributed spectral algorithms. Inverse Problems, 7 (2017) · Zbl 1372.65162
[19] Guo, Z.-C.; Shi, L., Optimal rates for coefficient-based regularized regression. Appl. Comput. Harmon. Anal., 3, 662-701 (2019) · Zbl 1467.62062
[20] Guo, Z.-C.; Zhou, D.-X., Concentration estimates for learning with unbounded sampling. Adv. Comput. Math., 1, 207-223 (2013) · Zbl 1283.68289
[21] Lin, S.-B.; Guo, X.; Zhou, D.-X., Distributed learning with regularized least squares. J. Mach. Learn. Res., 1, 3202-3232 (2017)
[22] Liu, J.; Shi, L., Statistical optimality of divide and conquer kernel-based functional linear regression. arXiv preprint arXiv:2211.10968 (2022)
[23] Ma, L.; Shi, L.; Wu, Z., Nyström subsampling method for coefficient-based regularized regression. Inverse Problems, 7 (2019) · Zbl 1493.62208
[24] Maron, O.; Lozano-Pérez, T., A framework for multiple-instance learning
[25] Muandet, K.; Balduzzi, D.; Schölkopf, B., Domain generalization via invariant feature representation, 10-18
[26] Muandet, K.; Fukumizu, K.; Sriperumbudur, B.; Schölkopf, B., Kernel mean embedding of distributions: A review and beyond. Found. Trends® Mach. Learn., 1-2, 1-141 (2017) · Zbl 1380.68336
[27] Müecke, N., Stochastic gradient descent meets distribution regression, 2143-2151
[28] Oliva, J.; Neiswanger, W.; Poczos, B.; Schneider, J.; Xing, E., Fast distribution to real regression, 706-714
[29] Ong, C. S.; Smola, A. J.; Williamson, R. C., Learning the kernel with hyperkernels. J. Mach. Learn. Res., 36, 1043-1071 (2005) · Zbl 1222.68277
[30] Pinelis, I. F.; Sakhanenko, A. I., Remarks on inequalities for large deviation probabilities. Theory Probab. Appl., 1, 143-148 (1986) · Zbl 0583.60023
[31] Poczos, B.; Singh, A.; Rinaldo, A.; Wasserman, L., Distribution-free distribution regression, 507-515
[32] Schleif, F.-M.; Tino, P., Indefinite proximity learning: A review. Neural Comput., 10, 2039-2096 (2015) · Zbl 1472.68155
[33] Schölkopf, B.; Herbrich, R.; Smola, A. J., A generalized representer theorem, 416-426
[34] Shi, L., Learning theory estimates for coefficient-based regularized regression. Appl. Comput. Harmon. Anal., 2, 252-265 (2013) · Zbl 1261.68096
[35] Shi, L., Distributed learning with indefinite kernels. Anal. Appl., 06, 947-975 (2019) · Zbl 1440.68238
[36] Shi, L.; Feng, Y.-L.; Zhou, D.-X., Concentration estimates for learning with \(\ell_1\)-regularizer and data dependent hypothesis spaces. Appl. Comput. Harmon. Anal., 2, 286-302 (2011) · Zbl 1221.68201
[37] Smale, S.; Zhou, D.-X., Estimating the approximation error in learning theory. Anal. Appl., 01, 17-41 (2003) · Zbl 1079.68089
[38] Smale, S.; Zhou, D.-X., Learning theory estimates via integral operators and their approximations. Constr. Approx., 2, 153-172 (2007) · Zbl 1127.68088
[39] Smola, A.; Gretton, A.; Song, L.; Schölkopf, B., A Hilbert space embedding for distributions, 13-31 · Zbl 1142.68407
[40] Song, L.; Boots, B.; Siddiqi, S. M.; Gordon, G.; Smola, A., Hilbert space embeddings of hidden Markov models, 991-998
[41] Song, L.; Parikh, A. P.; Xing, E. P., Kernel embeddings of latent tree graphical models, 2708-2716
[42] Steinwart, I.; Christmann, A., Support Vector Machines (2008), Springer Science & Business Media · Zbl 1203.68171
[43] Sun, H.; Wu, Q., Least square regression with indefinite kernels and coefficient regularization. Appl. Comput. Harmon. Anal., 1, 96-109 (2011) · Zbl 1225.65015
[44] Szabó, Z.; Gretton, A.; Poczos, B.; Sriperumbudur, B., Two-stage sampled learning theory on distributions, 948-957
[45] Szabó, Z.; Sriperumbudur, B. K.; Póczos, B.; Gretton, A., Learning theory for distribution regression. J. Mach. Learn. Res., 1, 5272-5311 (2016)
[46] Wendland, H.
[47] Wu, Q., Regularization networks with indefinite kernels. J. Approx. Theory, 1-18 (2013) · Zbl 1295.68141
[48] Yu, Z.; Ho, D. W.C.; Shi, Z.; Zhou, D.-X., Robust kernel-based distribution regression. Inverse Problems, 10 (2021)
[49] Zhang, K.; Peters, J.; Janzing, D.; Schölkopf, B., Kernel-based conditional independence test and application in causal discovery, 804-813
[50] Zhang, K.; Schölkopf, B.; Muandet, K.; Wang, Z., Domain adaptation under target and conditional shift, III-819-III-827
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.