×

Convergence guarantees for forward gradient descent in the linear regression model. (English) Zbl 07901368

Summary: Renewed interest in the relationship between artificial and biological neural networks motivates the study of gradient-free methods. Considering the linear regression model with random design, we theoretically analyze in this work the biologically motivated (weight-perturbed) forward gradient scheme that is based on random linear combination of the gradient. If \(d\) denotes the number of parameters and \(k\) the number of samples, we prove that the mean squared error of this method converges for \(k \gtrsim d^{2} \log (d)\) with rate \(d^{2} \log (d) / k\). Compared to the dimension dependence \(d\) for stochastic gradient descent, an additional factor \(d \log (d)\) occurs.

MSC:

62L20 Stochastic approximation
62J05 Linear regression; mixed models

References:

[1] Bach, F.; Moulines, E., Non-strongly-convex smooth stochastic approximation with convergence rate O (1/n), Adv. Neural Inf. Process. Syst., 26, 2013
[2] Baydin, A. G.; Pearlmutter, B. A.; Radul, A. A.; Siskind, J. M., Automatic differentiation in machine learning: a survey, J. Mach. Learn. Res., 18, 153, 1-43, 2018 · Zbl 06982909
[3] Baydin, A. G.; Pearlmutter, B. A.; Syme, D.; Wood, F.; Torr, P., Gradients without backpropagation, 2022, arXiv preprint arXiv:2202.08587
[4] Benveniste, A.; Métivier, M.; Priouret, P., Adaptive algorithms and stochastic approximations, (Applications of Mathematics (New York), Vol. 22, 1990, Springer-Verlag: Springer-Verlag Berlin), xii+365, Translated from the French by Stephen S. Wilson · Zbl 0752.93073
[5] Bos, T.; Schmidt-Hieber, J., Simulation code: Convergence guarantees for forward gradient descent in the linear regression model, 2024, https://github.com/Bostjm/SimulationCodeForwardGradient
[6] Clara, G.; Langer, S.; Schmidt-Hieber, J., Dropout Regularization Versus \(\ell_2\)-penalization in the linear model, 2023, arXiv e-prints arXiv:2306.10529
[7] Conn, A. R.; Scheinberg, K.; Vicente, L. N., Introduction to derivative-free optimization, (MPS/SIAM Series on Optimization, Vol. 8, 2009, Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA; Mathematical Programming Society (MPS), Philadelphia, PA), xii+277 · Zbl 1163.49001
[8] Crick, F., The recent excitement about neural networks, Nature, 337, 129-132, 1989
[9] Duchi, J. C.; Jordan, M. I.; Wainwright, M. J.; Wibisono, A., Optimal rates for zero-order convex optimization: the power of two function evaluations, IEEE Trans. Inform. Theory, 61, 5, 2788-2806, 2015 · Zbl 1359.90155
[10] Grossberg, S., Competitive learning: From interactive activation to adaptive resonance, Cogn. Sci., 11, 1, 23-63, 1987
[11] Györfi, L.; Walk, H., On the averaged stochastic approximation for linear regression, SIAM J. Control Optim., 34, 1, 31-61, 1996 · Zbl 0849.62044
[12] Hsu, D.; Kakade, S. M.; Zhang, T., Random design analysis of ridge regression, Found. Comput. Math., 14, 3, 569-600, 2014 · Zbl 1298.62120
[13] Kushner, H. J.; Yin, G. G., Stochastic approximation and recursive algorithms and applications, (Applications of Mathematics (New York), Vol. 35, 2003, Springer-Verlag: Springer-Verlag New York), xxii+474, Stochastic Modelling and Applied Probability · Zbl 1026.62084
[14] Lakshminarayanan, C.; Szepesvari, C., Linear stochastic approximation: How far does constant step-size and iterate averaging go?, (Storkey, A.; Perez-Cruz, F., Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics. Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics, Proceedings of Machine Learning Research, Vol. 84, 2018, PMLR), 1347-1355
[15] Larson, J.; Menickelly, M.; Wild, S. M., Derivative-free optimization methods, Acta Numer., 28, 287-404, 2019 · Zbl 1461.65169
[16] Launay, J.; Poli, I.; Boniface, F.; Krzakala, F., Direct feedback alignment scales to modern deep learning tasks and architectures, (Larochelle, H.; Ranzato, M.; Hadsell, R.; Balcan, M.; Lin, H., Advances in Neural Information Processing Systems, Vol. 33, 2020, Curran Associates, Inc.), 9346-9360
[17] Lillicrap, T. P.; Cownden, D.; Tweed, D. B.; Akerman, C. J., Random synaptic feedback weights support error backpropagation for deep learning, Nature Commun., 7, 1, 13276, 2016
[18] Lillicrap, T. P.; Santoro, A.; Marris, L.; Akerman, C. J.; Hinton, G., Backpropagation and the brain, Nat. Rev. Neurosci., 21, 335-346, 2020
[19] Liu, S.; Chen, P.-Y.; Kailkhura, B.; Zhang, G.; Hero III, A. O.; Varshney, P. K., A primer on zeroth-order optimization in signal processing and machine learning: Principals, recent advances, and applications, IEEE Signal Process. Mag., 37, 5, 43-54, 2020
[20] Mourtada, J., Exact minimax risk for linear least squares, and the lower tail of sample covariance matrices, Ann. Statist., 50, 4, 2157-2178, 2022 · Zbl 1500.62002
[21] Nesterov, Y.; Spokoiny, V., Random gradient-free minimization of convex functions, Found. Comput. Math., 17, 2, 527-566, 2017 · Zbl 1380.90220
[22] Nøkland, A., Direct feedback alignment provides learning in deep neural networks, (Lee, D.; Sugiyama, M.; Luxburg, U.; Guyon, I.; Garnett, R., Advances in Neural Information Processing Systems, Vol. 29, 2016, Curran Associates, Inc.)
[23] Polyak, B. T.; Juditsky, A. B., Acceleration of stochastic approximation by averaging, SIAM J. Control Optim., 30, 4, 838-855, 1992 · Zbl 0762.62022
[24] Ren, M.; Kornblith, S.; Liao, R.; Hinton, G., Scaling forward gradient with local losses, 2022, arXiv preprint arXiv:2210.03310
[25] Schmidt-Hieber, J., Interpreting learning in biological neural networks as zero-order optimization method, 2023, arXiv preprint arXiv:2301.11777
[26] Schmidt-Hieber, J.; Koolen, W., Hebbian learning inspired estimation of the linear regression parameters from queries, 2023, arXiv preprint
[27] Shaffer, J. P., The Gauss-Markov theorem and random regressors, Amer. Statist., 45, 4, 269-273, 1991
[28] Spall, J. C., Introduction to stochastic search and optimization, (Wiley-Interscience Series in Discrete Mathematics and Optimization, 2003, Wiley-Interscience [John Wiley & Sons], Hoboken, NJ), xx+595, Estimation, simulation, and control · Zbl 1088.90002
[29] Trappenberg, T. P., Fundamentals of Computational Neuroscience: Third Edition, 2022, Oxford University Press
[30] Triantafyllopoulos, K., On the central moments of the multidimensional Gaussian distribution, Math. Sci., 28, 2, 125-128, 2003 · Zbl 1051.60017
[31] Tsybakov, A. B., Optimal rates of aggregation, (Schölkopf, B.; Warmuth, M. K., Learning Theory and Kernel Machines, 2003, Springer Berlin Heidelberg: Springer Berlin Heidelberg Berlin, Heidelberg), 303-313 · Zbl 1208.62073
[32] Whittington, J. C.R.; Bogacz, R., An approximation of the error backpropagation algorithm in a predictive coding network with local Hebbian synaptic plasticity, Neural Comput., 29, 5, 1229-1262, 2017 · Zbl 1414.92058
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.