×

The Goldenshluger-Lepski method for constrained least-squares estimators over RKHSs. (English) Zbl 1473.62109

Summary: We study an adaptive estimation procedure called the Goldenshluger-Lepski method [A. Goldenshluger and O. Lepski, Bernoulli 14, No. 4, 1150–1190 (2008; Zbl 1168.62323); Theory Probab. Appl. 57, No. 2, 209–226 (2013; Zbl 1417.62056); translation from Teor. Veroyatn. Primen. 57, No. 2, 257–277 (2012)] in the context of reproducing kernel Hilbert space (RKHS) regression. Adaptive estimation provides a way of selecting tuning parameters for statistical estimators using only the available data. This allows us to perform estimation without making strong assumptions about the estimand. In contrast to procedures such as training and validation, the Goldenshluger-Lepski method uses all of the data to produce non-adaptive estimators for a range of values of the tuning parameters. An adaptive estimator is selected by performing pairwise comparisons between these non-adaptive estimators. Applying the Goldenshluger-Lepski method is non-trivial as it requires a simultaneous high-probability bound on all of the pairwise comparisons. In the RKHS regression context, we choose our non-adaptive estimators to be clipped least-squares estimators constrained to lie in a ball in an RKHS. Applying the Goldenshluger-Lepski method in this context is made more complicated by the fact that we cannot use the \(L^2\)-norm for performing the pairwise comparisons as it is unknown. We use the method to address two regression problems. In the first problem the RKHS is fixed, while in the second problem we adapt over a collection of RKHSs.

MSC:

62G05 Nonparametric estimation
62G07 Density estimation
46E22 Hilbert spaces with reproducing kernels (= (proper) functional Hilbert spaces, including de Branges-Rovnyak and other structured spaces)

References:

[1] Bach, F.R., Lanckriet, G.R.G. and Jordan, M.I. (2004). Multiple kernel learning, conic duality, and the SMO algorithm. In Proceedings of the Twenty-First International Conference on Machine Learning 6.
[2] Baraud, Y. (2002). Model selection for regression on a random design. ESAIM Probab. Stat. 6 127-146. · Zbl 1059.62038 · doi:10.1051/ps:2002007
[3] Barron, A., Birgé, L. and Massart, P. (1999). Risk bounds for model selection via penalization. Probab. Theory Related Fields 113 301-413. · Zbl 0946.62036 · doi:10.1007/s004400050210
[4] Baumeister, J. (1987). Stable Solution of Inverse Problems. Advanced Lectures in Mathematics. Braunschweig: Friedr. Vieweg & Sohn. · Zbl 0623.35008 · doi:10.1007/978-3-322-83967-1
[5] Bergh, J. and Löfström, J. (1976). Interpolation Spaces. An Introduction. Berlin: Springer. · Zbl 0344.46071
[6] Birgé, L. (2001). An alternative point of view on Lepski’s method. In State of the Art in Probability and Statistics (Leiden, 1999). Institute of Mathematical Statistics Lecture Notes - Monograph Series 36 113-133. Beachwood, OH: IMS. · Zbl 1373.62142 · doi:10.1214/lnms/1215090065
[7] Birgé, L. and Massart, P. (1997). From model selection to adaptive estimation. In Festschrift for Lucien Le Cam 55-87. New York: Springer. · Zbl 0920.62042
[8] Birgé, L. and Massart, P. (2001). Gaussian model selection. J. Eur. Math. Soc. (JEMS) 3 203-268. · Zbl 1037.62001 · doi:10.1007/s100970100031
[9] Birgé, L. and Massart, P. (2007). Minimal penalties for Gaussian model selection. Probab. Theory Related Fields 138 33-73. · Zbl 1112.62082 · doi:10.1007/s00440-006-0011-8
[10] Blanchard, G., Mathé, P. and Mücke, N. (2019). Lepskii principle in supervised learning. Preprint. Available at arXiv:1905.10764.
[11] Caponnetto, A. and Yao, Y. (2010). Cross-validation based adaptation for regularization operators in learning theory. Anal. Appl. (Singap.) 8 161-183. · Zbl 1209.68405 · doi:10.1142/S0219530510001564
[12] Chapelle, O., Vapnik, V., Bousquet, O. and Mukherjee, S. (2002). Choosing multiple parameters for support vector machines. Mach. Learn. 46. · Zbl 0998.68101
[13] De Vito, E., Pereverzyev, S. and Rosasco, L. (2010). Adaptive kernel methods using the balancing principle. Found. Comput. Math. 10 455-479. · Zbl 1204.68154 · doi:10.1007/s10208-010-9064-2
[14] Eberts, M. and Steinwart, I. (2013). Optimal regression rates for SVMs using Gaussian kernels. Electron. J. Stat. 7 1-42. · Zbl 1337.62073 · doi:10.1214/12-EJS760
[15] Giné, E. and Nickl, R. (2016). Mathematical Foundations of Infinite-Dimensional Statistical Models. New York: Cambridge Univ. Press. · Zbl 1358.62014 · doi:10.1017/CBO9781107337862
[16] Goldenshluger, A. and Lepski, O. (2008). Universal pointwise selection rule in multivariate function estimation. Bernoulli 14 1150-1190. · Zbl 1168.62323 · doi:10.3150/08-BEJ144
[17] Goldenshluger, A. and Lepski, O. (2009). Structural adaptation via \[{\mathbb{L}_p} \]-norm oracle inequalities. Probab. Theory Related Fields 143 41-71. · Zbl 1149.62020 · doi:10.1007/s00440-007-0119-5
[18] Goldenshluger, A. and Lepski, O. (2011). Bandwidth selection in kernel density estimation: Oracle inequalities and adaptive minimax optimality. Ann. Statist. 39 1608-1632. · Zbl 1234.62035 · doi:10.1214/11-AOS883
[19] Goldenshluger, A. and Pereverzev, S.V. (2003). On adaptive inverse estimation of linear functionals in Hilbert scales. Bernoulli 9 783-807. · Zbl 1055.62034 · doi:10.3150/bj/1066418878
[20] Goldenshluger, A.V. and Lepski, O.V. (2013). General selection rule from a family of linear estimators. Theory Probab. Appl. 57 209-226. · Zbl 1417.62056 · doi:10.1137/S0040585X97985923
[21] Gönen, M. and Alpaydın, E. (2011). Multiple kernel learning algorithms. J. Mach. Learn. Res. 12 2211-2268. · Zbl 1254.68204 · doi:10.1016/j.patcog.2012.09.002
[22] Hofmann, B. (2005). Approximate source conditions in regularization and an application to multiplication operators. In Proceedings: 10th International Conference “Mathematical Modelling and Analysis 2005” and 2nd International Conference “Computational Methods in Applied Mathematics” 29-34. Vilnius: Technika.
[23] Kreĭn, S.G. and Petunin, J.I. (1966). Scales of Banach spaces. Uspekhi Mat. Nauk 21 89-168. · Zbl 0173.15702
[24] Lepski, O. (1991). On a problem of adaptive estimation in Gaussian white noise. Theory Probab. Appl. 35 454-466. · Zbl 0745.62083
[25] Lepskiĭ, O.V. (1991). Asymptotically minimax adaptive estimation. I. Upper bounds. Optimally adaptive estimates. Teor. Veroyatn. Primen. 36 645-659. · Zbl 0776.62039 · doi:10.1137/1136085
[26] Lepskiĭ, O.V. (1992). Asymptotically minimax adaptive estimation. II. Schemes without optimal adaptation. Adaptive estimates. Teor. Veroyatn. Primen. 37 468-481. · Zbl 0787.62087 · doi:10.1137/1137095
[27] Lu, S., Mathé, P. and Pereverzev, S.V. (2020). Balancing principle in supervised learning for a general regularization scheme. Appl. Comput. Harmon. Anal. 48 123-148. · Zbl 07140131 · doi:10.1016/j.acha.2018.03.001
[28] Lu, S., Mathé, P. and Pereverzyev, S. Jr. (2019). Analysis of regularized Nyström subsampling for regression functions of low smoothness. Anal. Appl. (Singap.) 17 931-946. · Zbl 1440.68250 · doi:10.1142/S0219530519500039
[29] Mair, B.A. and Ruymgaart, F.H. (1996). Statistical inverse estimation in Hilbert scales. SIAM J. Appl. Math. 56 1424-1444. · Zbl 0864.62020 · doi:10.1137/S0036139994264476
[30] Mathé, P. and Pereverzev, S.V. (2001). Optimal discretization of inverse problems in Hilbert scales. Regularization and self-regularization of projection methods. SIAM J. Numer. Anal. 38 1999-2021. · Zbl 1049.65046 · doi:10.1137/S003614299936175X
[31] Oneto, L., Ridella, S. and Anguita, D. (2016). Tikhonov, Ivanov and Morozov regularization for support vector machine learning. Mach. Learn. 103 103-136. · Zbl 1357.68179 · doi:10.1007/s10994-015-5540-x
[32] Page, S. and Grünewälder, S. (2019). Ivanov-regularised least-squares estimators over large RKHSs and their interpolation spaces. J. Mach. Learn. Res. 20 120. · Zbl 1441.62178
[33] Page, S. and Grünewälder, S. (2021). Supplement to “The Goldenshluger-Lepski method for constrained least-squares estimators over RKHSs.” · Zbl 1473.62109 · doi:10.3150/20-BEJ1307SUPP
[34] Pereverzyev, S. and Tkachenko, P. (2017). Regularization by the linear functional strategy with multiple kernels. Front. Appl. Math. Stat. 3.
[35] Pricop-Jeckstadt, M. (2019). Nonlinear Tikhonov regularization in Hilbert scales with balancing principle tuning parameter in statistical inverse problems. Inverse Probl. Sci. Eng. 27 205-236. · Zbl 1460.65061 · doi:10.1080/17415977.2018.1454918
[36] Smale, S. and Zhou, D.-X. (2003). Estimating the approximation error in learning theory. Anal. Appl. (Singap.) 1 17-41. · Zbl 1079.68089 · doi:10.1142/S0219530503000089
[37] Steinwart, I. and Christmann, A. (2008). Support Vector Machines. New York: Springer. · Zbl 1203.68171
[38] Steinwart, I., Hush, D.R. and Scovel, C. (2009). Optimal rates for regularized least squares regression. In The 22nd Conference on Learning Theory. · Zbl 1127.68090
[39] Van der Vaart, A.W. and Wellner, J.A. (1996). Weak Convergence and Empirical Processes. New York: Springer. · Zbl 0862.60002 · doi:10.1007/978-1-4757-2545-2
[40] Williams, D. (1991). Probability with Martingales. Cambridge: Cambridge Univ. Press · Zbl 0722.60001 · doi:10.1017/CBO9780511813658
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.