Document Zbl 1543.65202

Optimization of random feature method in the high-precision regime. (English) Zbl 1543.65202

Commun. Appl. Math. Comput. 6, No. 2, 1490-1517 (2024).

Summary: Machine learning has been widely used for solving partial differential equations (PDEs) in recent years, among which the random feature method (RFM) exhibits spectral accuracy and can compete with traditional solvers in terms of both accuracy and efficiency. Potentially, the optimization problem in the RFM is more difficult to solve than those that arise in traditional methods. Unlike the broader machine-learning research, which frequently targets tasks within the low-precision regime, our study focuses on the high-precision regime crucial for solving PDEs. In this work, we study this problem from the following aspects: (i) we analyze the coefficient matrix that arises in the RFM by studying the distribution of singular values; (ii) we investigate whether the continuous training causes the overfitting issue; (iii) we test direct and iterative methods as well as randomized methods for solving the optimization problem. Based on these results, we find that direct methods are superior to other methods if memory is not an issue, while iterative methods typically have low accuracy and can be improved by preconditioning to some extent.

MSC:

65N35	Spectral, collocation and related methods for boundary value problems involving PDEs
65-04	Software, source code, etc. for problems pertaining to numerical analysis
65F20	Numerical solutions to overdetermined systems, pseudoinverses
65Y05	Parallel numerical computation
68T05	Learning and adaptive systems in artificial intelligence

Keywords:

random feature method (RFM); partial differential equation (PDE); least-squares problem; direct method; iterative method

Software:

SuperLU; LSQR; BLAS; CSparse; hypre; SuiteSparseQR; SuitSparseQR; PaStiX; LAPACK; SciPy; ARPACK; PETSc; Blendenpik; MUMPS; DGM; Eigen; LSMR; Adam; Python; FEAPpv

Cite Review PDF

Full Text: DOI

References:

[1]	Alaoui, A., Mahoney, M.W.: Fast randomized kernel ridge regression with statistical guarantees. In: advances in neural information processing systems, vol. 28, pp. 775-783. Curran Associates Inc., New York (2015)
[2]	Amestoy, PR; Duff, IS; Koster, J.; L’Excellent, J-Y, A fully asynchronous multifrontal solver using distributed dynamic scheduling, SIAM J. Matrix Anal. Appl., 23, 1, 15-41, 2001 · Zbl 0992.65018 · doi:10.1137/S0895479899358194
[3]	Anderson, E., Bai, Z., Bischof, C., et al.: LAPACK Users’ Guide. SIAM, Philadelphia (1995) · Zbl 0755.65028
[4]	Avron, H.; Maymounkov, P.; Toledo, S., Blendenpik: supercharging LAPACK’s least-squares solver, SIAM J. Sci. Comput., 32, 3, 1217-1236, 2010 · Zbl 1213.65069 · doi:10.1137/090767911
[5]	Bach, F.: Sharp analysis of low-rank kernel matrix approximations. In: Proceedings of the 26th Annual Conference on Learning Theory, PMLR, pp. 185-209 (2013)
[6]	Balay, S., Gropp, W., McInnes, L.C., Smith, B.F.: PETSc: the portable, extensible toolkit for scientific computation. Argonne National Laboratory, vol. 2, no. 17 (1998)
[7]	Barron, AR, Universal approximation bounds for superpositions of a sigmoidal function, IEEE Trans. Inf. Theory, 39, 3, 930-945, 1993 · Zbl 0818.68126 · doi:10.1109/18.256500
[8]	Belkin, M.; Hsu, D.; Ma, S.; Mandal, S., Reconciling modern machine-learning practice and the classical bias-variance trade-off, Proc. Natl. Acad. Sci. USA, 116, 32, 15849-15854, 2019 · Zbl 1433.68325 · doi:10.1073/pnas.1903070116
[9]	Blackford, L.S., Demmel, J., Dongarra, J., Duff, I., Hammarling, S., Henry, G., Heroux, M., Kaufman, L., Lumsdaine, A., Petitet, A., Pozo, R., Remington, K., Whaley, R.C. An updated set of basic linear algebra subprograms (BLAS). ACM Trans. Math. Softw. 28(2), 135-151 (2002) · Zbl 1070.65520
[10]	Bollhöfer, M., Schenk, O., Janalik, R., Hamm, S., Gullapalli, K.: State-of-the-art sparse direct solvers. In: Grama, A., Sameh, A. (eds) Parallel Algorithms in Computational Science and Engineering. Modeling and Simulation in Science, Engineering and Technology. pp. 3-33. Birkhäuser, Cham (2020) · Zbl 1455.65003
[11]	Calabrò, F.; Fabiani, G.; Siettos, C., Extreme learning machine collocation for the numerical solution of elliptic PDEs with sharp gradients, Comput. Methods Appl. Mech. Eng., 387, 1, 114-188, 2021 · Zbl 1507.65194
[12]	Chandra, R.; Dagum, L.; Kohr, D.; Menon, R.; Maydan, D.; McDonald, J., Parallel Programming in OpenMP, 2001, Burlington: Morgan Kaufmann, Burlington
[13]	Chen, J., Chi, X., E, W.: Bridging traditional and machine learning-based algorithms for solving PDEs: the random feature method. J. Mach. Learn. 1(3), 268-298 (2022)
[14]	Chen, Z., Schaeffer, H.: Conditioning of random feature matrices: double descent and generalization error. arXiv:2110.11477 (2021)
[15]	Cybenko, GV, Approximation by superpositions of a sigmoidal function, Math. Control Signals Syst., 2, 4, 303-314, 1989 · Zbl 0679.94019 · doi:10.1007/BF02551274
[16]	Davis, T.A.: Direct Methods for Sparse Linear Systems. Fundamentals of Algorithms. SIAM, Philadelphia (2006) · Zbl 1119.65021
[17]	Davis, TA, Algorithm 915, SuiteSparseQR: multifrontal multithreaded rank-revealing sparse QR factorization, ACM Trans. Math. Softw., 38, 1, 8, 2011 · Zbl 1365.65122 · doi:10.1145/2049662.2049670
[18]	Dong, S.; Li, Z., Local extreme learning machines and domain decomposition for solving linear and nonlinear partial differential equations, Comput. Methods Appl. Mech. Eng., 387, 1, 114-129, 2021 · Zbl 1507.65174
[19]	Drineas, P.; Kannan, R.; Mahoney, MW, Fast Monte Carlo algorithms for matrices II: computing a low-rank approximation to a matrix, SIAM J. Comput., 36, 1, 158-183, 2006 · Zbl 1111.68148 · doi:10.1137/S0097539704442696
[20]	E, W., Bing, Y.: The deep Ritz method: a deep learning-based numerical algorithm for solving variational problems. Commun. Math. Stat. 6(1), 1-12 (2018) · Zbl 1392.35306
[21]	E, W., Han, J., Jentzen, A.: Deep learning-based numerical methods for high-dimensional parabolic partial differential equations and backward stochastic differential equations. Commun. Math. Stat. 5(4), 349-380 (2017) · Zbl 1382.65016
[22]	E, W., Wang, Q.: Exponential convergence of the deep neural network approximation for analytic functions. Sci. China Math. 61(10), 1733-1740 (2018) · Zbl 1475.65007
[23]	Elmroth, E.; Gustavson, FG, Applying recursion to serial and parallel QR factorization leads to better performance, IBM J. Res. Dev., 44, 4, 605-624, 2000 · doi:10.1147/rd.444.0605
[24]	Falgout, R.D., Yang, U.M.: hypre: A library of high performance preconditioners. In: Sloot, P.M.A., Hoekstra, A.G., Tan, C.J.K., Dongarra, J.J. (eds) Computational Science—ICCS 2002. ICCS 2002. Lecture Notes in Computer Science, vol. 2331, pp. 632-641. Springer, Berlin, Heidelberg (2002) · Zbl 1056.65046
[25]	Fong, DC-L; Saunders, M., LSMR: an iterative algorithm for sparse least-squares problems, SIAM J. Sci. Comput., 33, 5, 2950-2971, 2011 · Zbl 1232.65052 · doi:10.1137/10079687X
[26]	Golub, G.H., Van Loan, C.F.: Matrix Computations, 4th edn. Johns Hopkins Studies in the Mathematical Sciences, vol. 3. Johns Hopkins University Press, Baltimore (2013) · Zbl 1268.65037
[27]	Gould, N.; Scott, J., The state-of-the-art of preconditioners for sparse linear least-squares problems, ACM Trans. Math. Softw., 43, 4, 1-35, 2017 · Zbl 1380.65064 · doi:10.1145/3014057
[28]	Guennebaud, G., et al.: Eigen v3. http://eigen.tuxfamily.org (2010)
[29]	Halko, N.; Martinsson, P-G; Tropp, JA, Finding structure with randomness: probabilistic algorithms for constructing approximate matrix decompositions, SIAM Rev., 53, 2, 217-288, 2011 · Zbl 1269.65043 · doi:10.1137/090771806
[30]	Han, J., Jentzen, A., E, W.: Solving high-dimensional partial differential equations using deep learning. Proc. Natl. Acad. Sci. USA 115(34), 8505-8510 (2018) · Zbl 1416.35137
[31]	Hénon, P.; Ramet, P.; Roman, J., PaStiX: a high-performance parallel direct solver for sparse symmetric positive definite systems, Parallel Comput., 28, 301-321, 2002 · Zbl 0984.68208 · doi:10.1016/S0167-8191(01)00141-7
[32]	Hestenes, MR; Stiefel, E., Methods of conjugate gradients for solving linear systems, J. Res. Nat. Bur. Stand., 49, 6, 409, 1952 · Zbl 0048.09901 · doi:10.6028/jres.049.044
[33]	Huang, G-B; Zhu, Q-Y; Siew, C-K, Extreme learning machine: theory and applications, Neurocomputing, 70, 1, 489-501, 2006 · doi:10.1016/j.neucom.2005.12.126
[34]	IEEE Standard for Floating-Point Arithmetic, IEEE Std 754-2019 (Revision of IEEE 754-2008), 1-84 (2019)
[35]	Karczmarz, S., Angenäherte auflösung von systemen linearer gleichungen, Bull. Int. Acad. Pol. Sci. Lett., 35, 355-357, 1937 · Zbl 0017.31703
[36]	Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. CoRR preprint. arXiv:1412.6980 (2014)
[37]	Lehoucq, R.B., Sorensen, D.C., Yang, C.: RPACK users’ guide—solution of large-scale eigenvalue problems with implicitly restarted Arnoldi methods. Software, environments, tools (1998) · Zbl 0901.65021
[38]	LeVeque, RJ, Finite Difference Methods for Ordinary and Partial Differential Equations: Steady-State and Time-Dependent Problems, 2007, Philadelphia: SIAM, Philadelphia · Zbl 1127.65080 · doi:10.1137/1.9780898717839
[39]	Li, XS, An overview of superlu: algorithms, implementation, and user interface, ACM Trans. Math. Softw., 31, 302-325, 2003 · Zbl 1136.65312 · doi:10.1145/1089014.1089017
[40]	Luo, T., Yang, H.: Two-layer neural networks for partial differential equations: optimization and generalization theory. arXiv:2006.15733 (2020)
[41]	Neal, R.M.: Bayesian learning for neural networks. Ph.D. thesis, University of Toronto, Toronto (1995)
[42]	Nguyen, VP; Rabczuk, T.; Bordas, S.; Duflot, M., Meshless methods: a review and computer implementation aspects, Math. Comput. Simul., 79, 3, 763-813, 2008 · Zbl 1152.74055 · doi:10.1016/j.matcom.2008.01.003
[43]	Nocedal, J.; Wright, SJ; Optimization, N., Springer Series in Operations Research and Financial Engineering, 2006, New York: Springer, New York · Zbl 1104.65059
[44]	Owhadi, H.; Zhang, L., Metric-based upscaling, Commun. Pure Appl. Math., 60, 5, 675-723, 2007 · Zbl 1190.35070 · doi:10.1002/cpa.20163
[45]	Paige, CC; Saunders, MA, LSQR: an algorithm for sparse linear equations and sparse least squares, ACM Trans. Math. Softw., 8, 1, 43-71, 1982 · Zbl 0478.65016 · doi:10.1145/355984.355989
[46]	Polak, E., Introduction to linear and nonlinear programming, IEEE Trans. Autom. Control, 19, 3, 290-290, 1974 · doi:10.1109/TAC.1974.1100535
[47]	Rahaman, N., Baratin, A., Arpit, D., Dräxler, F., Lin, M., Hamprecht, F.A., Bengio, Y., Courville, A.C.: On the spectral bias of neural networks. In: International Conference on Machine Learning (2018)
[48]	Rahimi, A., Recht, B.: Random features for large-scale kernel machines. In: Advances in Neural Information Processing Systems (NIPS), vol. 20. Curran Associates Inc. (2007)
[49]	Raissi, M.; Perdikaris, P.; Karniadakis, GE, Physics-informed neural networks: a deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations, J. Comput. Phys., 378, 686-707, 2019 · Zbl 1415.68175 · doi:10.1016/j.jcp.2018.10.045
[50]	Ruder, S.: An overview of gradient descent optimization algorithms. CoRR preprint. arXiv:1609.04747 (2016)
[51]	Rudi, A., Camoriano, R., Rosasco, L.: Less is more: Nyström computational regularization. In: Proceedings of the 28th International Conference on Neural Information Processing Systems, vol. 1, pp. 1657-1665. MIT Press, Cambridge (2015)
[52]	Saad, Y.: Iterative Methods for Sparse Linear Systems, 2nd edn. SIAM (2003) · Zbl 1031.65046
[53]	Shen, J., Tang, T., Wang, L.-L.: Spectral Methods: Algorithms, Analysis and Applications, vol. 41. Springer Science & Business Media, New York (2011) · Zbl 1227.65117
[54]	Sirignano, JA; Spiliopoulos, K., DGM: a deep learning algorithm for solving partial differential equations, J. Comput. Phys., 375, 1339-1364, 2018 · Zbl 1416.65394 · doi:10.1016/j.jcp.2018.08.029
[55]	Sutherland, D.J., Schneider, J.: On the error of random Fourier features. In: Proceedings of the Thirty-First Conference on Uncertainty in Artificial Intelligence (Arlington, Virginia, USA), UAI’15. pp. 862-871. AUAI Press (2015)
[56]	The Trilinos Project Team. The Trilinos Project Website (2023). https://trilinos.github.io
[57]	Virtanen, P., Gommers, R., Oliphant, T.E., Haberland, M., Reddy, T., Cournapeau, D., Burovski, E., Peterson, P., Weckesser, W., Bright, J., van der Walt, S.J., Brett, M., Wilson, J., Millman, K.J., Mayorov, N., Nelson, A.R.J., Jones, E., Kern, R., Larson, E., Carey, C.J., Polat, İ, Feng, Y., Moore, E.W., VanderPlas, J., Laxalde, D., Perktold, J., Cimrman, R., Henriksen, I., Quintero, E.A., Harris, C.R., Archibald, A.M., Ribeiro, A.H., Pedregosa, F., van Mulbregt, P., SciPy 1.0 Contributors: SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat. Methods 17, 261-272 (2020)
[58]	Williams, C., Seeger, M.: Using the Nyström method to speed up kernel machines. In: Advances in Neural Information Processing Systems, vol. 13. MIT Press, Cambridge (2000)
[59]	Xu, Z-QJ, Frequency principle: Fourier analysis sheds light on deep neural networks, Commun. Comput. Phys., 28, 5, 1746-1767, 2020 · Zbl 1507.68279 · doi:10.4208/cicp.OA-2020-0085
[60]	Yang, Y.; Hou, M.; Luo, J., A novel improved extreme learning machine algorithm in solving ordinary differential equations by Legendre neural network methods, Adv. Difference Equ., 2018, 1, 1-24, 2018 · Zbl 1448.68382 · doi:10.1186/s13662-018-1927-x
[61]	Zang, Y.; Bao, G.; Ye, X.; Zhou, H., Weak adversarial networks for high-dimensional partial differential equations, J. Comput. Phys., 411, 2020 · Zbl 1436.65156 · doi:10.1016/j.jcp.2020.109409
[62]	Zhang, X., Wang, Q., Zhang, Y.: Model-driven level 3 BLAS performance optimization on Loongson 3a processor. In: 2012 IEEE 18th International Conference on Parallel and Distributed Systems, pp. 684-691 (2012)
[63]	Zienkiewicz, OC; Taylor, RL; Zhu, JZ, The Finite Element Method: Its Basis and Fundamentals, 2005, Amsterdam: Elsevier, Amsterdam
[64]	Zouzias, A.; Freris, NM, Randomized extended Kaczmarz for solving least squares, SIAM J. Matrix Anal. Appl., 34, 2, 773-793, 2013 · Zbl 1273.65053 · doi:10.1137/120889897

This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.