×

Fast learning from \(\alpha\)-mixing observations. (English) Zbl 1359.62242

Summary: We present a new oracle inequality for generic regularized empirical risk minimization algorithms learning from stationary \({\alpha}\)-mixing processes. Our main tool to derive this inequality is a rather involved version of the so-called peeling method. We then use this oracle inequality to derive learning rates for some learning methods such as empirical risk minimization (ERM), least squares support vector machines (SVMs) using given generic kernels, and SVMs using the Gaussian RBF kernels for both least squares and quantile regression. It turns out that for i.i.d. processes our learning rates for ERM and SVMs with Gaussian kernels match, up to some arbitrarily small extra term in the exponent, the optimal rates, while in the remaining cases our rates are at least close to the optimal rates.

MSC:

62H30 Classification and discrimination; cluster analysis (statistical aspects)
62G08 Nonparametric regression and quantile regression
62M10 Time series, auto-correlation, regression, etc. in statistics (GARCH)
68T05 Learning and adaptive systems in artificial intelligence
Full Text: DOI

References:

[1] Bartlett, P. L.; Bousquet, O.; Mendelson, S., Local Rademacher complexities, Ann. Statist., 33, 1497-1537 (2005) · Zbl 1083.62034
[2] Blanchard, G.; Lugosi, G.; Vayatis, N., On the rate of convergence of regularized boosting classifiers, J. Mach. Learn. Res., 4, 861-894 (2003) · Zbl 1083.68109
[3] Bradley, R. C., Introduction to Strong Mixing Conditions, Vol. 1-3 (2007), Kendrick Press: Kendrick Press Heber City, UT · Zbl 1134.60004
[4] Chu, X.; Sun, H., Regularized least square regression with unbounded and dependent sampling, (Abstract and Applied Analysis, Vol. 2013 (2013), Hindawi Publishing Corporation) · Zbl 1273.62208
[5] Cucker, F.; Zhou, D., Learning Theory: An Approximation Theory Viewpoint (2007), Cambridge University Press · Zbl 1274.41001
[6] Devroye, L.; Györfi, L.; Lugosi, G., A Probabilistic Theory of Pattern Recognition (1996), Springer: Springer New York · Zbl 0853.68150
[7] Eberts, M.; Steinwart, I., Optimal regression rates for SVMs using Gaussian kernels, Electron. J. Stat., 7, 1-42 (2013) · Zbl 1337.62073
[8] Fan, J.; Yao, Q., Nonlinear Time Series (2003), Springer: Springer New York · Zbl 1014.62103
[9] Feng, Y.-L., Least-squares regularized regression with dependent samples and \(q\)-penalty, Appl. Anal., 91, 5, 979-991 (2012) · Zbl 1271.68203
[10] Guo, Z.-C.; Shi, L., Classification with non-i.i.d. sampling, Math. Comput. Modelling, 54, 5, 1347-1364 (2011) · Zbl 1228.62074
[11] Györfi, L.; Kohler, M.; Krzyżak, A.; Walk, H., A Distribution-Free Theory of Nonparametric Regression (2002), Springer: Springer New York · Zbl 1021.62024
[12] Ibragimov, I. A., Some limit theorems for stationary processes, Theory Probab. Appl., 7, 349-382 (1962) · Zbl 0119.14204
[13] Ibragimov, I. A.; Rozanov, Y. A., Gaussian Random Processes (1979), Springer-Verlag: Springer-Verlag Berlin
[14] Lee, W. S.; Bartlett, P. L.; Williamson, R. C., The importance of convexity in learning with squared loss, IEEE Trans. Inform. Theory, 44, 1974-1980 (1998) · Zbl 0935.68091
[15] Lozano, A.; Kulkarni, S.; Schapire, R., Convergence and consistency of regularized boosting algorithms with stationary \(\beta \)-mixing observations, (Weiss, Y.; Schölkopf, B.; Platt, J., Advances in Neural Information Processing Systems, Vol. 18 (2006), MIT Press: MIT Press Cambridge, MA), 819-826
[16] Meir, R., Nonparametric time series prediction through adaptive model selection, Mach. Learn., 39, 5-34 (2000) · Zbl 0954.68124
[17] Mendelson, S.; Neeman, J., Regularization in kernel learning, Ann. Statist., 38, 1, 526-565 (2010) · Zbl 1191.68356
[18] Modha, D. S.; Masry, E., Minimum complexity regression estimation with weakly dependent observations, IEEE Trans. Inform. Theory, 42, 2133-2145 (1996) · Zbl 0868.62015
[19] Mohri, M.; Rostamizadeh, A., Stability bounds for non-i.i.d. processes, (Platt, J.; Koller, D.; Singer, Y.; Roweis, S., Advances in Neural Information Processing Systems, Vol. 20 (2008), MIT Press: MIT Press Cambridge, MA), 1025-1032
[20] Mohri, M.; Rostamizadeh, A., Rademacher complexity bounds for non-i.i.d. processes, (Koller, D.; Schuurmans, D.; Bengio, Y.; Bottou, L., Advances in Neural Information Processing Systems, Vol. 21 (2009), MIT Press: MIT Press Cambridge, MA), 1097-1104
[21] Mohri, M.; Rostamizadeh, A., Stability bounds for stationary \(\phi \)-mixing and \(\beta \)-mixing processes, J. Mach. Learn. Res., 4, 1-26 (2009)
[22] Nobel, A., Limits to classification and regression estimation from ergodic processes, Ann. Statist., 27, 262-273 (1999) · Zbl 0933.62033
[23] Pan, Z.-W.; Xiao, Q.-W., Least-square regularized regression with non-i.i.d sampling, J. Statist. Plann. Inference, 139, 10, 3579-3587 (2009) · Zbl 1176.68163
[24] Rosenblatt, M., A central limit theorem and a strong mixing condition, Proc. Natl. Acad. Sci. USA, 42, 43-47 (1956) · Zbl 0070.13804
[25] Steinwart, I., Oracle inequalities for SVMs that are based on random entropy numbers, J. Complexity, 25, 437-454 (2009) · Zbl 1192.68540
[26] Steinwart, I., Two oracle inequalities for regularized boosting classifiers, Stat. Interface, 2, 271-284 (2009) · Zbl 1245.68161
[27] Steinwart, I.; Anghel, M., An SVM approach for forecasting the evolution of an unknown ergodic dynamical system from observations with unknown noise, Ann. Statist., 37, 841-875 (2009) · Zbl 1162.62089
[28] Steinwart, I.; Christmann, A., Support Vector Machines (2008), Springer: Springer New York · Zbl 1203.68171
[29] Steinwart, I.; Christmann, A., Fast learning from non-i.i.d. observations, (Bengio, Y.; Schuurmans, D.; Lafferty, J.; Williams, C. K.I.; Culotta, A., Advances in Neural Information Processing Systems, Vol. 22 (2009), MIT Press: MIT Press Cambridge, MA), 1768-1776
[30] Steinwart, I.; Christmann, A., Estimating conditional quantiles with the help of the pinball loss, Bernoulli, 17, 211-225 (2011) · Zbl 1284.62235
[31] Steinwart, I.; Hush, D.; Scovel, C., Learning from dependent observations, J. Multivariate Anal., 100, 175-194 (2009) · Zbl 1158.68040
[33] Sun, H.; Wu, Q., A note on application of integral operator in learning theory, Appl. Comput. Harmon. Anal., 26, 3, 416-421 (2009) · Zbl 1165.68059
[34] Sun, H.; Wu, Q., Regularized least square regression with dependent samples, Adv. Comput. Math., 32, 175-189 (2010) · Zbl 1191.68535
[35] Tsybakov, A. B., Optimal aggregation of classifiers in statistical learning, Ann. Statist., 32, 135-166 (2004) · Zbl 1105.62353
[36] van de Geer, S., Empirical Processes in \(M\)-Estimation (1999), Cambridge University Press
[37] Vidyasagar, M., A Theory of Learning and Generalization: with Applications to Neural Networks and Control Systems (2003), Springer: Springer London · Zbl 1008.68102
[38] Wolkonski, V. A.; Rozanov, Y. A., Some limit theorems for random functions, I, Theory Probab. Appl., 4, 2, 178-197 (1959) · Zbl 0092.33502
[39] Xu, Y.-L.; Chen, D.-R., Learning rates of regularized regression for exponentially strongly mixing sequence, J. Statist. Plann. Inference, 138, 2180-2189 (2008) · Zbl 1134.62050
[40] Yu, B., Rates of convergence for empirical processes of stationary mixing sequences, Ann. Probab., 22, 94-116 (1994) · Zbl 0802.60024
[41] Zhang, Y.; Cao, F.; Yan, C., Learning rates of least-square regularized regression with strongly mixing observation, Int. J. Mach. Learn. Cybern., 3, 4, 277-283 (2012)
[42] Zou, B.; Li, L., The performance bounds of learning machines based on exponentially strongly mixing sequences, Comput. Math. Appl., 53, 1050-1058 (2007) · Zbl 1151.68600
[43] Zou, B.; Li, L.; Xu, Z., The generalization performance of ERM algorithm with strongly mixing observations, Mach. Learn., 75, 275-295 (2009) · Zbl 1470.68214
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.