×

Multi-stage neural networks: function approximator of machine precision. (English) Zbl 07842853

Summary: Deep learning techniques are increasingly applied to scientific problems, where the precision of networks is crucial. Despite being deemed as universal function approximators, neural networks, in practice, struggle to reduce the prediction errors below \(O(10^{-5})\) even with large network size and extended training iterations. To address this issue, we developed the multi-stage neural networks that divides the training process into different stages, with each stage using a new network that is optimized to fit the residue from the previous stage. Across successive stages, the residue magnitudes decreases substantially and follows an inverse power-law relationship with the residue frequencies. The multi-stage neural networks effectively mitigate the spectral biases associated with regular neural networks, enabling them to capture the high frequency feature of target functions. We demonstrate that the prediction error from the multi-stage training for both regression problems and physics-informed neural networks can nearly reach the machine-precision \(O(10^{-16})\) of double-floating point within a finite number of iterations. Such levels of accuracy are rarely attainable using single neural networks alone.

MSC:

68Txx Artificial intelligence
65Mxx Numerical methods for partial differential equations, initial value and time-dependent initial-boundary value problems
65Nxx Numerical methods for partial differential equations, boundary value problems

References:

[1] LeCun, Yann; Bengio, Yoshua; Hinton, Geoffrey, Deep learning, Nature, 521, 7553, 436-444, 2015
[2] Ronneberger, O.; Fischer, P.; Brox, T., U-net: convolutional networks for biomedical image segmentation, (Medical Image Computing and Computer-Assisted Intervention (MICCAI). Medical Image Computing and Computer-Assisted Intervention (MICCAI), LNCS, vol. 9351, 2015, Springer), 234-241
[3] Mildenhall, Ben; Srinivasan, Pratul P.; Tancik, Matthew; Barron, Jonathan T.; Ramamoorthi, Ravi; Ng, Ren, Nerf: representing scenes as neural radiance fields for view synthesis, Commun. ACM, 65, 1, 99-106, 2021
[4] Collobert, Ronan; Weston, Jason, A unified architecture for natural language processing: deep neural networks with multitask learning, (Proceedings of the 25th International Conference on Machine Learning, 2008), 160-167
[5] Devlin, Jacob; Chang, Ming-Wei; Lee, Kenton; Toutanova, Kristina, Bert: pre-training of deep bidirectional transformers for language understanding, 2018, arXiv preprint
[6] Chowdhary, K. R., Natural language processing, Fundam. Artif. Intell., 603-649, 2020
[7] Kochkov, Dmitrii; Smith, Jamie A.; Alieva, Ayya; Wang, Qing; Brenner, Michael P.; Hoyer, Stephan, Machine learning-accelerated computational fluid dynamics, Proc. Natl. Acad. Sci. USA, 118, 21, Article e2101784118 pp., 2021
[8] Lemos, Pablo; Jeffrey, Niall; Cranmer, Miles; Ho, Shirley; Battaglia, Peter, Rediscovering orbital mechanics with machine learning, 2022, arXiv preprint
[9] Wang, Yongji; Lai, Ching-Yao; Gómez-Serrano, Javier; Buckmaster, Tristan, Self-similar blow-up profile for the Boussinesq equations via a physics-informed neural network, 2022, arXiv preprint
[10] Raissi, Maziar; Perdikaris, Paris; Karniadakis, George E., Physics-informed neural networks: a deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations, J. Comput. Phys., 378, 686-707, 2019 · Zbl 1415.68175
[11] Karniadakis, George Em; Kevrekidis, Ioannis G.; Lu, Lu; Perdikaris, Paris; Wang, Sifan; Yang, Liu, Physics-informed machine learning, Nat. Rev. Phys., 3, 6, 422-440, 2021
[12] Hornik, Kurt; Stinchcombe, Maxwell; White, Halbert, Multilayer feedforward networks are universal approximators, Neural Netw., 2, 5, 359-366, 1989 · Zbl 1383.92015
[13] Hornik, Kurt; Stinchcombe, Maxwell; White, Halbert, Universal approximation of an unknown mapping and its derivatives using multilayer feedforward networks, Neural Netw., 3, 5, 551-560, 1990
[14] Baldi, Pierre; Hornik, Kurt, Neural networks and principal component analysis: learning from examples without local minima, Neural Netw., 2, 1, 53-58, 1989
[15] Krishnapriyan, Aditi; Gholami, Amir; Zhe, Shandian; Kirby, Robert; Mahoney, Michael W., Characterizing possible failure modes in physics-informed neural networks, Adv. Neural Inf. Process. Syst., 34, 26548-26560, 2021
[16] Sitzmann, Vincent; Martel, Julien; Bergman, Alexander; Lindell, David; Wetzstein, Gordon, Implicit neural representations with periodic activation functions, Adv. Neural Inf. Process. Syst., 33, 7462-7473, 2020
[17] Saragadam, Vishwanath; LeJeune, Daniel; Tan, Jasper; Balakrishnan, Guha; Veeraraghavan, Ashok; Baraniuk, Richard G., Wire: wavelet implicit neural representations, 2023, arXiv preprint
[18] Jagtap, Ameya D.; Kawaguchi, Kenji; Karniadakis, George Em, Adaptive activation functions accelerate convergence in deep and physics-informed neural networks, J. Comput. Phys., 404, Article 109136 pp., 2020 · Zbl 1453.68165
[19] Wang, Honghui; Lu, Lu; Song, Shiji; Huang, Gao, Learning specialized activation functions for physics-informed neural networks, 2023, arXiv preprint · Zbl 1538.65434
[20] He, Kaiming; Zhang, Xiangyu; Ren, Shaoqing; Sun, Jian, Deep residual learning for image recognition, (Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016), 770-778
[21] Jagtap, Ameya D.; Karniadakis, George E., Extended physics-informed neural networks (XPINNs): a generalized space-time domain decomposition based deep learning framework for nonlinear partial differential equations, (AAAI Spring Symposium: MLPS, 2021), 2002-2041 · Zbl 1542.65182
[22] Moseley, Ben; Markham, Andrew; Nissen-Meyer, Tarje, Finite basis physics-informed neural networks (FBPINNs): a scalable domain decomposition approach for solving differential equations, 2021, arXiv preprint · Zbl 07726220
[23] Ainsworth, Mark; Dong, Justin, Galerkin neural network approximation of singularly-perturbed elliptic systems, A Special Issue in Honor of the Lifetime Achievements of J. Tinsley Oden. A Special Issue in Honor of the Lifetime Achievements of J. Tinsley Oden, Comput. Methods Appl. Mech. Eng., 402, Article 115169 pp., 2022 · Zbl 1507.74440
[24] Tu, Chun-Chen; Ting, Paishun; Chen, Pin-Yu; Liu, Sijia; Zhang, Huan; Yi, Jinfeng; Hsieh, Cho-Jui; Cheng, Shin-Ming, Autozoom: autoencoder-based zeroth order optimization method for attacking black-box neural networks, (Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, 2019), 742-749
[25] Chiu, Pao-Hsiung; Cheng Wong, Jian; Ooi, Chinchun; Ha Dao, My; Ong, Yew-Soon, Can-PINN: a fast physics-informed neural network based on coupled-automatic-numerical differentiation method, Comput. Methods Appl. Mech. Eng., 395, Article 114909 pp., 2022 · Zbl 1507.65204
[26] Müller, Johannes; Zeinhofer, Marius, Achieving high accuracy with PINNs via energy natural gradient descent, (International Conference on Machine Learning, 2023, PMLR), 25471-25485
[27] McClenny, Levi; Braga-Neto, Ulisses, Self-adaptive physics-informed neural networks using a soft attention mechanism, 2020, arXiv preprint · Zbl 07640528
[28] Wang, Sifan; Sankaran, Shyam; Perdikaris, Paris, Respecting causality is all you need for training physics-informed neural networks, 2022, arXiv preprint · Zbl 1539.65143
[29] Wang, Sifan; Teng, Yujun; Perdikaris, Paris, Understanding and mitigating gradient flow pathologies in physics-informed neural networks, SIAM J. Sci. Comput., 43, 5, A3055-A3081, 2021 · Zbl 1530.68232
[30] Wang, Sifan; Yu, Xinling; Perdikaris, Paris, When and why PINNs fail to train: a neural tangent kernel perspective, J. Comput. Phys., 449, Article 110768 pp., 2022 · Zbl 07524768
[31] van der Meer, Remco; Oosterlee, Cornelis W.; Borovykh, Anastasia, Optimally weighted loss functions for solving PDEs with neural networks, J. Comput. Appl. Math., 405, Article 113887 pp., 2022 · Zbl 1518.65143
[32] Trask, Nathaniel; Henriksen, Amelia; Martinez, Carianne; Cyr, Eric, Hierarchical partition of unity networks: fast multilevel training, (Mathematical and Scientific Machine Learning, 2022, PMLR), 271-286
[33] Howard, Amanda A.; Murphy, Sarah H.; Ahmed, Shady E.; Stinis, Panos, Stacked networks improve physics-informed training: applications to neural networks and deep operator networks, 2023, arXiv preprint
[34] Ralston, Anthony; Rabinowitz, Philip, A First Course in Numerical Analysis, 2001, Courier Corporation · Zbl 0976.65001
[35] Kingma, Diederik P.; Ba, Jimmy, Adam: a method for stochastic optimization, 2014, arXiv preprint
[36] Rahaman, Nasim; Baratin, Aristide; Arpit, Devansh; Draxler, Felix; Lin, Min; Hamprecht, Fred; Bengio, Yoshua; Courville, Aaron, On the spectral bias of neural networks, (International Conference on Machine Learning, 2019, PMLR), 5301-5310
[37] Xu, Zhi-Qin John; Zhang, Yaoyu; Luo, Tao; Xiao, Yanyang; Ma, Zheng, Frequency principle: Fourier analysis sheds light on deep neural networks, 2019, arXiv preprint · Zbl 1507.68279
[38] Jacot, Arthur; Gabriel, Franck; Hongler, Clément, Neural tangent kernel: convergence and generalization in neural networks, Adv. Neural Inf. Process. Syst., 31, 2018 · Zbl 07765141
[39] Tancik, Matthew; Srinivasan, Pratul; Mildenhall, Ben; Fridovich-Keil, Sara; Raghavan, Nithin; Singhal, Utkarsh; Ramamoorthi, Ravi; Barron, Jonathan; Ng, Ren, Fourier features let networks learn high frequency functions in low dimensional domains, Adv. Neural Inf. Process. Syst., 33, 7537-7547, 2020
[40] Wang, Sifan; Wang, Hanwen; Perdikaris, Paris, On the eigenvector bias of Fourier feature networks: from regression to solving multi-scale PDEs with physics-informed neural networks, Comput. Methods Appl. Mech. Eng., 384, Article 113938 pp., 2021 · Zbl 1506.35130
[41] Michaud, Eric J.; Liu, Ziming; Tegmark, Max, Precision machine learning, Entropy, 25, 1, 175, 2023
[42] Glorot, Xavier; Bengio, Yoshua, Understanding the difficulty of training deep feedforward neural networks, (Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, 2010, JMLR Workshop and Conference Proceedings), 249-256
[43] Aggarwal, Charu C., Neural Networks and Deep Learning, vol. 10(978), 3, 2018, Springer · Zbl 1402.68001
[44] Cowen-Breen, C., Navigating Physically-Informed Loss Landscapes with Stochastic Gradient Descent, May 2022, Princeton University: Princeton University Princeton, NJ, Bachelor’s thesis
[45] Iwasaki, Yunona; Lai, Ching-Yao, One-dimensional ice shelf hardness inversion: clustering behavior and collocation resampling in physics-informed neural networks, J. Comput. Phys., 492, Article 112435 pp., 2023 · Zbl 07742915
[46] Lu, Lu; Meng, Xuhui; Mao, Zhiping; Karniadakis, George Em, Deepxde: a deep learning library for solving differential equations, SIAM Rev., 63, 1, 208-228, 2021 · Zbl 1459.65002
[47] Qin, Shu-Mei; Li, Min; Xu, Tao; Dong, Shao-Qun, Rar-PINN algorithm for the data-driven vector-soliton solutions and parameter discovery of coupled nonlinear equations, 2022, arXiv preprint
[48] Yu, Jeremy; Lu, Lu; Meng, Xuhui; Karniadakis, George Em, Gradient-enhanced physics-informed neural networks for forward and inverse PDE problems, Comput. Methods Appl. Mech. Eng., 393, Article 114823 pp., 2022 · Zbl 1507.65217
[49] Wu, Chenxi; Zhu, Min; Tan, Qinyang; Kartha, Yadhu; Lu, Lu, A comprehensive study of non-adaptive and residual-based adaptive sampling for physics-informed neural networks, Comput. Methods Appl. Mech. Eng., 403, Article 115671 pp., 2023 · Zbl 1542.65146
[50] Eggers, Jens; Fontelos, Marco Antonio, Singularities: Formation, Structure, and Propagation, vol. 53, 2015, Cambridge University Press · Zbl 1335.76002
[51] Kreyszig, Erwin, Advanced Engineering Mathematics, 2007, John Wiley & Sons · Zbl 0803.00001
[52] Liu, Dong C.; Nocedal, Jorge, On the limited memory BFGS method for large scale optimization, Math. Program., 45, 1-3, 503-528, 1989 · Zbl 0696.90048
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.