Document Zbl 07694853

Alt, Tobias; Schrader, Karl; Augustin, Matthias; Peter, Pascal; Weickert, Joachim

Connections between numerical algorithms for PDEs and neural networks. (English) Zbl 07694853

J. Math. Imaging Vis. 65, No. 1, 185-208 (2023).

Summary: We investigate numerous structural connections between numerical algorithms for partial differential equations (PDEs) and neural architectures. Our goal is to transfer the rich set of mathematical foundations from the world of PDEs to neural networks. Besides structural insights, we provide concrete examples and experimental evaluations of the resulting architectures. Using the example of generalised nonlinear diffusion in 1D, we consider explicit schemes, acceleration strategies thereof, implicit schemes, and multigrid approaches. We connect these concepts to residual networks, recurrent neural networks, and U-net architectures. Our findings inspire a symmetric residual network design with provable stability guarantees and justify the effectiveness of skip connections in neural networks from a numerical perspective. Moreover, we present U-net architectures that implement multigrid techniques for learning efficient solutions of partial differential equation models, and motivate uncommon design choices such as trainable nonmonotone activation functions. Experimental evaluations show that the proposed architectures save half of the trainable parameters and can thus outperform standard ones with the same model complexity. Our considerations serve as a basis for explaining the success of popular neural architectures and provide a blueprint for developing new mathematically well-founded neural building blocks.

Cited in 1 Document

MSC:

68-XX	Computer science
94-XX	Information and communication theory, circuits

Keywords:

numerical algorithms; partial differential equations; neural networks; nonlinear diffusion; stability

Software:

Mish; Adam; MgNet; torchdiffeq; U-Net; PDE-Net

Cite Review PDF

Full Text: DOI arXiv

OA License

References:

[1]	Alt, T.; Peter, P.; Weickert, J.; Schrader, K.; Elmoataz, A.; Fadili, J.; Quéau, Y.; Rabin, J.; Simon, L., Translating numerical concepts for PDEs into neural architectures, Scale Space and Variational Methods in Computer Vision, 294-306 (2021), Cham: Springer, Cham · Zbl 1501.65059
[2]	Alt, T., Weickert, J.: Learning integrodifferential models for denoising. In: Proc. 2021 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 2045-2049. IEEE Computer Society Press, Toronto, Canada (2021)
[3]	Alt, T., Weickert, J., Peter, P.: Translating diffusion, wavelets, and regularisation into residual networks. arXiv:2002.02753v3 [cs.LG] (2020)
[4]	Andreu, F.; Ballester, C.; Caselles, V.; Mazón, JM, Minimizing total variation flow, Differ. Integral Equations, 14, 3, 321-360 (2001) · Zbl 1020.35037
[5]	Arridge, S.; Hauptmann, A., Networks for nonlinear diffusion problems in imaging, J. Math. Imaging Vis., 62, 471-487 (2020) · Zbl 1434.68496
[6]	Aubert, G.; Kornprobst, P., Mathematical Problems in Image Processing: Partial Differential Equations and the Calculus of Variations (2006), New York: Springer, New York · Zbl 1110.35001
[7]	Bäker, M., Another look at neural multigrid, Int. J. Mod. Phys. C, 8, 2, 191-205 (1997) · Zbl 0945.65137
[8]	Bäker, M.; Mack, G.; Speh, M., Multigrid meets neural nets, Nucl. Phys. B Proc. Suppl., 30, 269-272 (1993)
[9]	Bengio, Y.; Simard, P.; Frasconi, P., Learning long-term dependencies with gradient descent is difficult, IEEE Trans. Neural Netw., 5, 2, 157-166 (1994)
[10]	Benning, M.; Celledoni, E.; Erhardt, MJ; Owren, B.; Schönlieb, C., Deep learning as optimal control problems: models and numerical methods, IFAC-PapersOnline, 54, 9, 620-623 (2021)
[11]	Brandt, A., Multi-level adaptive solutions to boundary-value problems, Math. Comput., 31, 138, 333-390 (1977) · Zbl 0373.65054
[12]	Briggs, WL; Henson, VE; McCormick, SF, A Multigrid Tutorial (2000), Philadelphia: SIAM, Philadelphia · Zbl 0958.65128
[13]	Brito-Loeza, C.; Chen, K., Multigrid algorithm for high order denoising, SIAM J. Imaging Sci., 3, 3, 363-389 (2010) · Zbl 1205.68474
[14]	Bruhn, A.; Weickert, J.; Kohlberger, T.; Schnörr, C., A multigrid platform for real-time motion computation with discontinuity-preserving variational methods, Int. J. Comput. Vis., 70, 3, 257-277 (2006) · Zbl 1477.68336
[15]	Bungert, L.; Raab, R.; Roith, T.; Schwinn, L.; Tenbrinck, D.; Elmoataz, A.; Fadili, J.; Quéau, Y.; Rabin, J.; Simon, L., CLIP: Cheap Lipschitz training of neural networks, Scale Space and Variational Methods in Computer Vision, 307-319 (2021), Cham: Springer, Cham · Zbl 1484.68204
[16]	Chan, TF; Shen, J., Image Processing and Analysis: Variational, PDE, Wavelet, and Stochastic Methods (2005), Philadelphia: SIAM, Philadelphia · Zbl 1095.68127
[17]	Chang, B., Meng, L., Haber, E., Ruthotto, L., Begert, D., Holtham, E.: Reversible architectures for arbitrarily deep residual neural networks. In: Proc. 32nd AAAI Conference on Artificial Intelligence, pp. 2811-2818. New Orleans, LA (2018)
[18]	Charbonnier, P., Blanc-Féraud, L., Aubert, G., Barlaud, M.: Two deterministic half-quadratic regularization algorithms for computed imaging. In: Proc. 1994 IEEE International Conference on Image Processing, vol. 2, pp. 168-172. IEEE Computer Society Press, Austin, TX (1994)
[19]	Chen, R.T.Q., Rubanova, Y., Bettencourt, J., Duvenaud, D.K.: Neural ordinary differential equations. In: Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R. (eds.) Proc. 32nd International Conference on Neural Information Processing Systems, Advances in Neural Information Processing Systems, vol. 31, pp. 6571-6583. Montréal, Canada (2018)
[20]	Chen, Y.; Pock, T., Trainable nonlinear reaction diffusion: a flexible framework for fast and effective image restoration, IEEE Trans. Pattern Anal. Mach. Intell., 39, 6, 1256-1272 (2016)
[21]	Combettes, PL; Pesquet, J., Deep neural network structures solving variational inequalities, Set-Valued Var. Anal., 28, 3, 491-518 (2020) · Zbl 1448.49014
[22]	Combettes, PL; Pesquet, J., Lipschitz certificates for layered network structures driven by averaged activation operators, SIAM J. Math. Data Sci., 2, 2, 529-557 (2020) · Zbl 07468921
[23]	Croce, F., Andriushchenko, M., Hein, M.: Provable robustness of ReLU networks via maximization of linear regions. In: Chaudhuri, K., Sugiyama, M. (eds.) Proc. 22nd International Conference on Artificial Intelligence and Statistics, Proceedings of Machine Learning Research, vol. 89, pp. 2057-2066. Okinawa, Japan (2019)
[24]	Daubechies, I., DeVore, R., Foucart, S., Hanin, B., Petrova, G.: Nonlinear approximation and (deep) ReLU networks. Constructive Approximation (2021). Online first · Zbl 1501.41003
[25]	De Felice, P.; Marangi, C.; Nardulli, G.; Pasquariello, G.; Tedesco, L., Dynamics of neural networks with non-monotone activation function, Netw. Comput. Neural Syst., 4, 1, 1-9 (1993) · Zbl 0775.92004
[26]	Didas, S.; Weickert, J.; Burgeth, B., Properties of higher order nonlinear diffusion filtering, J. Math. Imaging Vis., 35, 208-226 (2009) · Zbl 1490.94011
[27]	Dong, H.; Yang, G.; Liu, F.; Mo, Y.; Guo, Y.; Hernández, MV; González-Castro, V., Automatic brain tumor detection and segmentation using U-Net based fully convolutional networks, Medical Image Understanding and Analysis: MIUA 2017, Communications in Computer and Information Science, 506-517 (2017), Cham: Springer, Cham
[28]	Du Fort, EC; Frankel, SP, Stability conditions in the numerical treatment of parabolic differential equations, Math. Tables Other Aids Comput., 7, 135-152 (1953) · Zbl 0053.26401
[29]	Duits, R.; Smets, B.; Bekkers, E.; Portegies, J.; Elmoataz, A.; Fadili, J.; Quéau, Y.; Rabin, J.; Simon, L., Equivariant deep learning via morphological and linear scale space PDEs on the space of positions and orientations, Scale Space and Variational Methods in Computer Vision, 27-39 (2021), Cham: Springer, Cham · Zbl 1484.68206
[30]	E, W., Han, J., Jentzen, A.: Algorithms for solving high dimensional PDEs: from nonlinear Monte Carlo to machine learning. arXiv:2008.13333v2 [math.NA] (2020)
[31]	Eliasof, M., Ephrath, J., Ruthotto, R., Treister, E.: Multigrid-in-channels neural network architectures. arXiv:2011.09128v2 [cs.CV] (2020)
[32]	Esser, P., Sutter, E., Ommer, B.: A variational U-Net for conditional appearance and shape generation. In: Proc. 2018 IEEE Conference on Computer Vision and Pattern Recognition, pp. 8857-8866. IEEE Computer Society Press, Salt Lake City, UT (2018)
[33]	Galić, I.; Weickert, J.; Welk, M.; Bruhn, A.; Belyaev, A.; Seidel, HP, Image compression with anisotropic diffusion, J. Math. Imaging Vis., 31, 2-3, 255-269 (2008) · Zbl 1448.94012
[34]	Genzel, M., Macdonald, J., März, M.: Solving inverse problems with deep neural networks—robustness included? arXiv:2011.04268v1 [cs.LG] (2020)
[35]	Gerschgorin, S., Fehlerabschätzung für das differenzenverfahren zur Lösung partieller differentialgleichungen, Z. Angew. Math. Mech., 10, 373-382 (1930) · JFM 56.0467.03
[36]	Gilboa, G., Zeevi, Y., Sochen, N.: Image enhancement segmentation and denoising by time dependent nonlinear diffusion processes. In: Proc. 2001 IEEE International Conference on Image Processing, vol. 3, pp. 134-137. IEEE Computer Society Press, Thessaloniki, Greece (2001)
[37]	Golts, A.; Freedman, D.; Elad, M., Deep energy: task driven training of deep neural networks, IEEE J. Sel. Top. Signal Process., 15, 2, 324-338 (2021)
[38]	Goodfellow, I., Warde-Farley, D., Mirza, M., Courville, A., Bengio, Y.: Maxout networks. In: Dasgupta, S., McAllester, D. (eds.) Proc. 30th International Conference on Machine Learning, Proceedings of Machine Learning Research, vol. 28, pp. 1319-1327. Atlanta, GA (2013)
[39]	Goodfellow, IJ; Bengio, Y.; Courville, A., Deep Learning (2016), Cambridge: MIT Press, Cambridge · Zbl 1373.68009
[40]	Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) Proc. 3rd International Conference on Learning Representations. San Diego, CA (2015)
[41]	Gottlieb, D.; Gustafsson, B., Generalized Du Fort-Frankel methods for parabolic initial-boundary value problems, SIAM J. Numer. Anal., 13, 1, 129-144 (1875) · Zbl 0344.35048
[42]	Gouk, H.; Frank, E.; Pfahringer, B.; Cree, MJ, Regularisation of neural networks by enforcing Lipschitz continuity, Mach. Learn., 110, 393-416 (2021) · Zbl 07432806
[43]	Greenfeld, D., Galun, M., Kimmel, R., Yavneh, I., Basri, R.: Learning to optimize multigrid PDE solvers. In: Chaudhuri, K., Salakhutdinov, R. (eds.) Proc. 36th International Conference on Machine Learning, Proceedings of Machine Learning Research, vol. 97, pp. 2415-2423. Long Beach, CA (2019)
[44]	Gribonval, R., Kutyniok, G., Nielsen, M., Voigtlaender, F.: Approximation spaces of deep neural networks. Constructive Approximation (2021). Online first
[45]	Günther, S.; Ruthotto, L.; Schroder, JB; Cyr, EC; Gauger, NR, Layer-parallel training of deep residual neural networks, SIAM J. Math. Data Sci., 2, 1, 1-23 (2020) · Zbl 1508.68306
[46]	Gusak, J., Katrutsa, A., Daulbaev, T., Cichocki, A., Oseledets, I.: Meta-solver for neural ordinary differential equations. arXiv:2103.08561v1 [cs.LG] (2021)
[47]	Haber, E., Lensink, K., Treister, E., Ruthotto, L.: IMEXnet a forward stable deep neural network. In: Chaudhuri, K., Salakhutdinov, R. (eds.) Proc. 36th International Conference on Machine Learning, Proceedings of Machine Learning Research, vol. 97, pp. 2525-2534. Long Beach, CA (2019)
[48]	Haber, E.; Ruthotto, L., Stable architectures for deep neural networks, Inverse Probl., 34, 1, 014004 (2017) · Zbl 1426.68236
[49]	Haber, E., Ruthotto, L., Holtham, E., Jun, S.H.: Learning across scales—multiscale methods for convolution neural networks. In: Proc. 32nd AAAI Conference on Artificial Intelligence, pp. 2811-2818. New Orleans, LA (2018)
[50]	Hackbusch, W., Multigrid Methods and Applications (1985), New York: Springer, New York · Zbl 0595.65106
[51]	Hafner, D.; Ochs, P.; Weickert, J.; Reißel, M.; Grewenig, S.; Rosenhahn, B.; Andres, B., FSI schemes: fast semi-iterative solvers for PDEs and optimisation methods, Pattern Recognition, 91-102 (2016), Cham: Springer, Cham
[52]	Hartmann, D., Lessig, C., Margenberg, N., Richter, T.: A neural network multigrid solver for the Navier-Stokes equations. arXiv:2008.11520v1 [physics.comp-ph] (2020) · Zbl 07525148
[53]	He, J.; Xu, J., MgNet: a unified framework of multigrid and convolutional neural network, Sci. China Math., 62, 1331-1354 (2019) · Zbl 1476.65026
[54]	He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proc. 2016 IEEE Conference on Computer Vision and Pattern Recognition, pp. 770-778. IEEE Computer Society Press, Las Vegas, NV (2016)
[55]	Hopfield, JJ, Neural networks and physical systems with emergent collective computational abilities, Proc. Natl. Acad. Sci., 79, 8, 2554-2558 (1982) · Zbl 1369.92007
[56]	Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: Proc. 2017 IEEE Conference on Computer Vision and Pattern Recognition, pp. 4700-4708. IEEE Computer Society Press, Honolulu, HI (2017)
[57]	Iijima, T., Basic theory on normalization of pattern (in case of typical one-dimensional pattern), Bull. Electrotech. Lab., 26, 368-388 (1962)
[58]	Katrutsa, A.; Daulbaev, T.; Oseledets, I., Black-box learning of multigrid parameters, J. Comput. Appl. Math., 368, 112524 (2020) · Zbl 1440.65265
[59]	Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv:1412.6980v1 [cs.LG] (2014)
[60]	Kobler, E., Effland, A., Kunisch, K., Pock, T.: Total deep variation for linear inverse problems. In: Proc. 2020 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 7549-7558. IEEE Computer Society Press, Seattle, WA (2020)
[61]	Kobler, E.; Klatzer, T.; Hammernik, K.; Pock, T.; Roth, V.; Vetter, T., Variational networks: connecting variational methods and deep learning, Pattern Recognition, 281-293 (2017), Cham: Springer, Cham
[62]	Köstler, H., Stürmer, M., Freundl, C., Rüde, U.: PDE based video compression in real time. Tech. Rep. 07-11, Lehrstuhl für Informatik 10, Univ. Erlangen-Nürnberg, Germany (2007)
[63]	Kutyniok, G., Petersen, P., Raslan, M., Schneider, R.: A theoretical analysis of deep neural networks and parametric PDEs. Constructive Approximation (2021). Online first · Zbl 07493717
[64]	LeCun, Y.; Bengio, Y.; Hinton, G., Deep learning, Nature, 521, 436-444 (2015)
[65]	LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P., Gradient-based learning applied to document recognition, Proc. IEEE, 86, 11, 2278-2324 (1998)
[66]	Leino, K., Wang, Z., Fredrikson, M.: Globally-robust neural networks. arXiv:2102.08452v1 [cs.LG] (2021)
[67]	Li, M., He, L., Lin, Z.: Implicit Euler skip connections: Enhancing adversarial robustness via numerical stability. In: Daumé, H., III., Singh, A. (eds.) Proc. 37th International Conference on Machine Learning, Proceedings of Machine Learning Research, vol. 119, pp. 5874-5883. Austria, Vienna (2020)
[68]	Long, Z.; Lu, Y.; Dong, B., PDE-Net 2.0: learning PDEs from data with a numeric-symbolic hybrid deep network, J. Comput. Phys., 399, 2197, 108925 (2019) · Zbl 1454.65131
[69]	Lu, Y., Zhong, A., Li, Q., Dong, B.: Beyond finite layer neural networks: Bridging deep architectures and numerical differential equations. In: Dy, J., Krause, A. (eds.) Proc. 35th International Conference on Machine Learning, Proceedings of Machine Learning Research, vol. 80, pp. 3276-3285. Stockholm, Sweden (2018)
[70]	Mainberger, M.; Hoffmann, S.; Weickert, J.; Tang, CH; Johannsen, D.; Neumann, F.; Doerr, B.; Bruckstein, AM; ter Haar Romeny, B.; Bronstein, AM; Bronstein, MM, Optimising spatial and tonal data for homogeneous diffusion inpainting, Scale Space and Variational Methods in Computer Vision, 26-37 (2012), Berlin: Springer, Berlin
[71]	Meilijson, I., Ruppin, E.: Optimal signalling in attractor neural networks. In: Tesauro, G., Touretzky, D., Leen, T. (eds.) Proc. 7th International Conference on Neural Information Processing Systems, Advances in Neural Information Processing Systems, vol. 7, pp. 485-492. Denver, CO (1994) · Zbl 0925.92016
[72]	Misra, D.: Mish: A self regularized non-monotonic activation function. arXiv:1908.08681v3 [cs.LG] (2020)
[73]	Nair, V., Hinton, G.E.: Rectified linear units improve restricted Boltzmann machines. In: Proc. 27th International Conference on Machine Learning, pp. 807-814. Haifa, Israel (2010)
[74]	Nesterov, Y., A method for solving the convex programming problem with convergence rate \(O(1/k^2)\), Sov. Math. Dokl., 4, 1035-1038 (1963)
[75]	Newell, A.; Yang, K.; Deng, J.; Leibe, B.; Matas, J.; Sebe, N.; Welling, M., Stacked hourglass networks for human pose estimation, Computer Vision—ECCV 2016, 483-499 (2016), Cham: Springer, Cham
[76]	Ochs, P.; Meinhardt, T.; Leal-Taixe, L.; Möller, M.; Ferrari, V.; Herbert, M.; Sminchisescu, C.; Weiss, Y., Lifting layers: analysis and applications, Computer Vision—ECCV 2018, 53-68 (2018), Cham: Springer, Cham
[77]	Ott, K., Katiyar, P., Hennig, P., Tiemann, M.: ResNet after all? Neural ODEs and their numerical solution. In: Proc. 9th International Conference on Learning Representations. Vienna, Austria (2021)
[78]	Ouala, S., Pascual, A., Fablet, R.: Residual integration neural network. In: Proc. 2019 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 3622-3626. IEEE Computer Society Press, Brighton, UK (2019)
[79]	Parhi, R., Nowak, R.D.: What kinds of functions do deep neural networks learn? Insights from variational spline theory. arXiv:2105.03361v1 [stat.ML] (2021) · Zbl 1538.68051
[80]	Perona, P.; Malik, J., Scale space and edge detection using anisotropic diffusion, IEEE Trans. Pattern Anal. Mach. Intell., 12, 629-639 (1990)
[81]	Polyak, BT, Some methods of speeding up the convergence of iteration methods, USSR Comput. Math. Math. Phys., 4, 5, 1-17 (1964) · Zbl 0147.35301
[82]	Rackauckas, C., Ma, Y., Martensen, J., Warnter, C., Zubov, K., Supekar, R., Skinner, D., Ramadhan, A., Edelman, A.: Universal differential equations for scientific machine learning. arXiv:2001.04385v3 [cs.LG] (2020)
[83]	Raissi, M.; Perdikaris, P.; Karniadakis, GE, Physics-informed neural networks: a deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations, J. Comput. Phys., 378, 686-707 (2019) · Zbl 1415.68175
[84]	Ramachandran, P., Zoph, B., Le, Q.V.: Searching for activation functions. arXiv:1710.05941v2 [cs.NE] (2017)
[85]	Ratner, V., Zeevi, Y.Y.: The dynamics of image processing viewed as damped elastic deformation. In: Proc. 17th European Signal Processing Conference, pp. 45-49. IEEE Computer Society Press, Glasgow, UK (2009)
[86]	Rolnick, D., Tegmark, M.: The power of deeper networks for expressing natural functions. In: Proc. 6th International Conference on Learning Representations. Vancouver, Canada (2018)
[87]	Ronneberger, O.; Fischer, P.; Brox, T.; Navab, N.; Hornegger, J.; Wells, W.; Frangi, A., U-net: convolutional networks for biomedical image segmentation, Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015, 234-241 (2015), Cham: Springer, Cham
[88]	Rousseau, F.; Drumetz, L.; Fablet, R., Residual networks as flows of diffeomorphisms, J. Math. Imaging Vis., 62, 365-375 (2020) · Zbl 1434.68521
[89]	Rudin, LI; Osher, S.; Fatemi, E., Nonlinear total variation based noise removal algorithms, Physica D, 60, 1-4, 259-268 (1992) · Zbl 0780.49028
[90]	Rudy, SH; Brunton, SL; Proctor, JL; Kutz, JN, Data-driven discovery of partial differential equations, Sci. Adv., 3, 4, e1602614 (2017)
[91]	Rumelhart, DE; McClelland, JL, Parallel Distributed Processing: Explorations in the Microstructure of Cognition (1986), Cambridge, MA: MIT Press, Cambridge, MA
[92]	Russakovsky, O.; Deng, J.; Su, H.; Krause, J.; Satheesh, S.; Huang, Z.; Karpathy, A.; Khosla, A.; Bernstein, M.; Berg, AC; Fei-Fei, L., ImageNet large scale visual recognition challenge, Int. J. Comput. Vis., 115, 211-252 (2015)
[93]	Ruthotto, L.; Haber, E., Deep neural networks motivated by partial differential equations, J. Math. Imaging Vis., 62, 352-364 (2020) · Zbl 1434.68522
[94]	Saad, Y., Iterative Methods for Sparse Linear Systems (2003), Philadelphia: SIAM, Philadelphia · Zbl 1031.65046
[95]	Salimans, T., Kingma, D.P.: Weight normalization: a simple reparameterization to accelerate training of deep neural networks. arXiv:1602.07868v3 [cs.LG] (2016)
[96]	Schaeffer, H.: Learning partial differential equations via data discovery and sparse optimization. Proc. R. Soc. Lond. Ser. A 473(2197), 20160446 (2017) · Zbl 1404.35397
[97]	Scherzer, O.; Weickert, J., Relations between regularization and diffusion filtering, J. Math. Imaging Vis., 12, 1, 43-63 (2000) · Zbl 0945.68183
[98]	Schmaltz, C.; Peter, P.; Mainberger, M.; Ebel, F.; Weickert, J.; Bruhn, A., Understanding, optimising, and extending data compression with anisotropic diffusion, Int. J. Comput. Vis., 108, 3, 222-240 (2014)
[99]	Schmidhuber, J., Deep learning in neural networks: an overview, Neural Netw., 61, 85-117 (2015)
[100]	Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proc. 30th International Conference on Machine Learning, Proceedings of Machine Learning Research, vol. 28, pp. 1139-1147. Atlanta, GA (2013)
[101]	Thorpe, M., van Gennip, Y.: Deep limits of residual neural networks. arXiv:1810.11741v2 [math.CA] (2019)
[102]	Tretter, C., Spectral Theory of Block Operator Matrices and Applications (2008), London: Imperial College Press, London · Zbl 1173.47003
[103]	van Der Houwen, PJ; Sommeijer, BP, On the internal stability of explicit, m-stage Runge-Kutta methods for large m-values, Z. Angew. Math. Mech., 60, 10, 479-485 (1980) · Zbl 0455.65052
[104]	Weickert, J., Theoretical foundations of anisotropic diffusion in image processing, Comput. Suppl., 11, 221-236 (1996)
[105]	Weickert, J., Anisotropic Diffusion in Image Processing (1998), Stuttgart: Teubner, Stuttgart · Zbl 0886.68131
[106]	Weickert, J.; Benhamouda, B.; Solina, F.; Kropatsch, WG; Klette, R.; Bajcsy, R., A semidiscrete nonlinear scale-space theory and its relation to the Perona-Malik paradox, Advances in Computer Vision, 1-10 (1997), Wien: Springer, Wien
[107]	Weickert, J.; Welk, M.; Weickert, J.; Hagen, H., Tensor field interpolation with PDEs, Visualization and Processing of Tensor Fields, 315-325 (2006), Berlin: Springer, Berlin · Zbl 1084.68135
[108]	Weickert, J.; Welk, M.; Wickert, M.; Kuijper, A.; Bredies, K.; Pock, T.; Bischof, H., \({L}^2\)-stable nonstandard finite differences for anisotropic diffusion, Scale Space and Variational Methods in Computer Vision, 390-391 (2013), Berlin: Springer, Berlin
[109]	You, YL; Kaveh, M., Fourth-order partial differential equations for noise removal, IEEE Trans. Image Process., 9, 10, 1723-1730 (2000) · Zbl 0962.94011
[110]	Zhang, L.; Schaeffer, H., Forward stability of ResNet and its variants, J. Math. Imaging Vis., 62, 328-351 (2020) · Zbl 1434.68528
[111]	Zhu, M., Chang, B., Fu, C.: Convolutional neural networks combined with Runge-Kutta methods. In: Proc. 7th International Conference on Learning Representations. New Orleans, LA (2019)
[112]	Zhu, M.; Min, W.; Wang, Q.; Zou, S.; Chen, X., PFLU and FPFLU: two novel non-monotonic activation functions in convolutional neural networks, Neurocomputing, 429, 110-117 (2021)
[113]	Zou, D.; Balan, R.; Singh, M., On Lipschitz bounds of general convolutional neural networks, IEEE Trans. Inf. Theory, 66, 3, 1738-1759 (2020) · Zbl 1446.94032

This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.