×

Approximation properties of residual neural networks for Kolmogorov PDEs. (English) Zbl 1518.65018

Summary: In recent years residual neural networks (ResNets) as introduced by K. He et al. [“Deep residual learning for image recognition”, in: Proceedings of the 2016 IEEE conference on computer vision and pattern recognition, CVPR 2017. Los Alamitos, CA: IEEE Computer Society. 770–778 (2016; doi:10.1109/CVPR.2016.90)] have become very popular in a large number of applications, including in image classification and segmentation. They provide a new perspective in training very deep neural networks without suffering the vanishing gradient problem. In this article we show that ResNets are able to approximate solutions of Kolmogorov partial differential equations (PDEs) with constant diffusion and possibly nonlinear drift coefficients without suffering the curse of dimensionality, which is to say the number of parameters of the approximating ResNets grows at most polynomially in the reciprocal of the approximation accuracy \(\varepsilon > 0\) and the dimension of the considered PDE \(d\in \mathbb{N}\). We adapt a proof in A. Jentzen et al. [Commun. Math. Sci. 19, No. 5, 1167–1205 (2021; Zbl 1475.65157)] – who showed a similar result for feedforward neural networks (FNNs) – to ResNets. In contrast to FNNs, the Euler-Maruyama approximation structure of ResNets simplifies the construction of the approximating ResNets substantially. Moreover, contrary to [Jentzen et al., loc. cit.], in our proof using ResNets does not require the existence of an FNN representing the identity map, which enlarges the set of applicable activation functions.

MSC:

65C99 Probabilistic methods, stochastic differential equations
65M75 Probabilistic methods, particle methods, etc. for initial value and initial-boundary value problems involving PDEs
68T07 Artificial neural networks and deep learning

Citations:

Zbl 1475.65157

Software:

Inception-v4

References:

[1] M. M. K. Asaduzzaman Shahjahan Murase, Faster training using fusion of activation functions for feed forward neural networks, Int J Neural Syst., 19, 437-448 (2009) · doi:10.1142/S0129065709002130
[2] B. K. Avelin Nyström, Neural ODEs as the deep limit of ResNets with constant weights, Anal. Appl., 19, 397-437 (2021) · Zbl 1539.68293 · doi:10.1142/S0219530520400023
[3] C. Beck, S. Becker, P. Grohs, N. Jaafari and A. Jentzen, Solving the Kolmogorov PDE by means of deep learning, J. Sci. Comput., 88 (2021), Paper No. 73, 28pp. · Zbl 1490.65006
[4] C. W. A. Beck E. Jentzen, Machine learning approximation algorithms for high-dimensional fully nonlinear partial differential equations and second-order backward stochastic differential equations, J. Nonlinear Sci., 29, 1563-1619 (2019) · Zbl 1442.91116 · doi:10.1007/s00332-018-9525-3
[5] W. E, C. Ma and Q. Wang, A priori estimates of the population risk for residual networks, arXiv: 1903.02154, 2019, 19 pages.
[6] W. B. E. Yu, The deep Ritz method: A deep learning-based numerical algorithm for solving variational problems, Commun. Math. Stat., 6, 1-12 (2018) · Zbl 1392.35306 · doi:10.1007/s40304-018-0127-z
[7] D. P. A. C. Elbrächter Grohs Jentzen Schwab, DNN expression rate analysis of high-dimensional PDEs: application to option pricing, Constr. Approx., 55, 3-71 (2022) · Zbl 1500.35009 · doi:10.1007/s00365-021-09541-6
[8] R. G. M. F. Gribonval Kutyniok Nielsen Voigtlaender, Approximation spaces of deep neural networks, Constr. Approx., 55, 259-367 (2022) · Zbl 1491.82017 · doi:10.1007/s00365-021-09543-4
[9] P. L. Grohs Herrmann, Deep neural network approximation for high-dimensional elliptic PDEs with boundary conditions, IMA J. Numer. Anal., 42, 2055-2082 (2022) · Zbl 1502.65264 · doi:10.1093/imanum/drab031
[10] P. Grohs, F. Hornung, A. Jentzen and P. von Wurstemberger, A proof that artificial neural networks overcome the curse of dimensionality in the numerical approximation of Black-Scholes partial differential equations, Accepted in Mem. Amer. Math. Soc., arXiv: 1809.02362 (2018), 124 pages.
[11] P. Grohs, F. Hornung, A. Jentzen and P. Zimmermann, Space-time error estimates for deep neural network approximations for differential equations, Revision requested from Adv. Comput. Math., arXiv: 1908.03833, (2019), 86 pages.
[12] P. Grohs, A. Jentzen and D. Salimova, Deep neural network approximations for solutions of PDEs based on Monte Carlo algorithms, Partial Differ. Equ. Appl., 3 (2022), Paper No. 45, 41pp. · Zbl 1490.65232
[13] I. M. M. A. Gruber Hlaváč Železný Karpov, Facing face recognition with ResNet: Round one, Interactive Collaborative Robotics, 10459, 67-74 (2017) · doi:10.1007/978-3-319-66471-2_8
[14] M. M. A. Hairer Hutzenthaler Jentzen, Loss of regularity for Kolmogorov equations, Ann. Probab., 43, 468-527 (2015) · Zbl 1322.35083 · doi:10.1214/13-AOP838
[15] J. A. W. Han Jentzen E., Solving high-dimensional partial differential equations using deep learning, Proc. Natl. Acad. Sci. USA, 115, 8505-8510 (2018) · Zbl 1416.35137 · doi:10.1073/pnas.1718942115
[16] K. He, X. Zhang, S. Ren and J. Sun, Delving deep into rectifiers: Surpassing human-level performance on imagenet classification, In Proceedings of the IEEE International Conference on Computer Vision, (2015), 1026-1034.
[17] K. He, X. Zhang, S. Ren and J. Sun, Deep residual learning for image recognition, In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (2016), 770-778.
[18] K. He, X. Zhang, S. Ren and J. Sun, Identity mappings in deep residual networks, In European Conference on Computer Vision, (2016), 630-645.
[19] F. Hornung, A. Jentzen and D. Salimova, Space-time deep neural network approximations for high-dimensional partial differential equations, arXiv: 2006.02199, (2020), 52 pages.
[20] A. D. T. Jentzen Salimova Welti, A proof that deep artificial neural networks overcome the curse of dimensionality in the numerical approximation of Kolmogorov partial differential equations with constant diffusion and nonlinear drift coefficients, Commun. Math. Sci., 19, 1167-1205 (2021) · Zbl 1475.65157 · doi:10.4310/CMS.2021.v19.n5.a1
[21] A. Kolmogoroff, Über die analytischen Methoden in der Wahrscheinlichkeitsrechnung, Math. Ann., 104, 415-458 (1931) · Zbl 0001.14902 · doi:10.1007/BF01457949
[22] N. V. Krylov, M. Röckner and J. Zabczyk, Stochastic PDE’s and Kolmogorov Equations in Infinite Dimensions, vol. 1715 of Lecture Notes in Mathematics, Springer-Verlag, Berlin; Centro Internazionale Matematico Estivo (C.I.M.E.), Florence, 1999. · Zbl 0927.00037
[23] G. P. M. R. Kutyniok Petersen Raslan Schneider, A theoretical analysis of deep neural networks and parametric PDEs, Constr. Approx., 55, 73-125 (2022) · Zbl 07493717 · doi:10.1007/s00365-021-09551-4
[24] J. Müller, On the space-time expressivity of ResNets, In ICLR 2020 Workshop on Integration of Deep Neural Models and Differential Equations, (2020).
[25] E. Orhan and X. Pitkow, Skip connections eliminate singularities, In International Conference on Learning Representations, (2018).
[26] P. K. S. D. R. Panigrahi Ghosh Parhi, Navigation of autonomous mobile robot using different activation functions of wavelet neural network, Arch. Control Sci., 25, 21-34 (2015) · Zbl 1446.93059 · doi:10.1515/acsc-2015-0002
[27] P. F. Petersen Voigtlaender, Optimal approximation of piecewise smooth functions using deep ReLU neural networks, Neural Networks, 108, 296-330 (2018) · Zbl 1434.68516
[28] T. K. D. Qin Wu Xiu, Data driven governing equations approximation using deep neural networks, J. Comput. Phys., 395, 620-635 (2019) · Zbl 1455.65125 · doi:10.1016/j.jcp.2019.06.042
[29] P. Ramachandran, B. Zoph and Q. V. Le, Searching for Activation Functions, arXiv: 1710.05941, (2017), 13 pages.
[30] C. Szegedy, S. Ioffe, V. Vanhoucke and A. A. Alemi, Inception-v4, inception-resnet and the impact of residual connections on learning, In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, (2017), AAAI’17, AAAI Press, 4278-4284.
[31] Y. Y. Y. X. Wang Li Song Rong, The influence of the activation function in a convolution neural network model of facial expression recognition, Appl. Sci., 10, 1897 (2020)
[32] Z. C. A. Wu Shen van den Hengel, Wider or deeper: Revisiting the ResNet model for visual recognition, Pattern Recognition, 90, 119-133 (2019)
[33] K. M. T. X. X. L. T. Zhang Sun Han Yuan Guo Liu, Residual networks of residual networks: Multilevel residual networks, IEEE Transactions on Circuits and Systems for Video Technology, 28, 1303-1314 (2017)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.