×

Sparse additive function decompositions facing basis transforms. (English) Zbl 07927554

Summary: High-dimensional real-world systems can often be well characterized by a small number of simultaneous low-complexity interactions. The analysis of variance (ANOVA) decomposition and the anchored decomposition are typical techniques to find sparse additive decompositions of functions. In this paper, we are interested in a setting, where these decompositions are not directly sparse, but become so after an appropriate basis transform. Noting that the sparsity of those additive function decompositions is equivalent to the fact that most of its mixed partial derivatives vanish, we can exploit a connection to the underlying function graphs to determine an orthogonal transform that realizes the appropriate basis change. This is done in three steps: we apply singular value decomposition to minimize the number of vertices of the function graph, and joint block diagonalization techniques of families of matrices followed by sparse minimization based on relaxations of the zero “norm” for minimizing the number of edges. For the latter one, we propose and analyze minimization techniques over the manifold of special orthogonal matrices. Various numerical examples illustrate the reliability of our approach for functions having, after a basis transform, a sparse additive decomposition into summands with at most two variables.

MSC:

26Bxx Functions of several variables
33F05 Numerical approximation and evaluation of special functions
58C05 Real-valued functions on manifolds
90C26 Nonconvex programming, global optimization
65Kxx Numerical methods for mathematical programming, optimization and variational techniques
65F25 Orthogonalization in numerical linear algebra
15B99 Special matrices

References:

[1] P. Ablin and G. Peyré, Fast and accurate optimization on the orthogonal manifold without retraction, In G. Camps-Valls, F. J. R. Ruiz, and I. Valera, editors, Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, volume 151 of Proceedings of Machine Learning Research, PMLR, 2022, 5636-5657.
[2] P. A. Absil, R. Mahony and B. Andrews, Convergence of the iterates of descent methods for analytic cost functions, SIAM Journal on Optimization, 16 (2005), 531-547. doi: 10.1137/040605266. · Zbl 1092.90036 · doi:10.1137/040605266
[3] R. Agarwal, L. Melnick, N. Frosst, X. Zhang, B. Lengerich, R. Caruana and G. E. Hinton, Neural additive models: Interpretable machine learning with neural nets, Advances in Neural Information Processing Systems, 34 (2021), 4699-4711.
[4] H. Attouch, J. Bolte, P. Redont and A. Soubeyran, Proximal alternating minimization and projection methods for nonconvex problems: An approach based on the Kurdyka-Łojasiewicz inequality, Mathematics of Operations Research, 35 (2010), 438-457. doi: 10.1287/moor.1100.0449. · Zbl 1214.65036 · doi:10.1287/moor.1100.0449
[5] F. A. Ba and M. Quellmalz, Accelerating the sinkhorn algorithm for sparse multi-marginal optimal transport via fast Fourier transforms, Algorithms, 15 (2022), 311. doi: 10.3390/a15090311. · doi:10.3390/a15090311
[6] J. Baldeaux and M. Gnewuch, Optimal randomized multilevel algorithms for infinite-dimensional integration on function spaces with ANOVA-type decomposition, SIAM Journal on Numerical Analysis, 52 (2014), 1128-1155. doi: 10.1137/120896001. · Zbl 1318.65002 · doi:10.1137/120896001
[7] F. Bartel, D. Potts and M. Schmischke, Grouped transformations and regularization in high-dimensional explainable ANOVA approximation, SIAM Journal on Scientific Computing, 44 (2022), A1606-A1631. doi: 10.1137/20M1374547. · Zbl 1515.62018 · doi:10.1137/20M1374547
[8] F. Beier, J. von Lindheim, S. Neumayer and G. Steidl, Unbalanced multi-marginal optimal transport, Journal of Mathematical Imaging and Vision, 65 (2023), 394-413. doi: 10.1007/s10851-022-01126-7. · Zbl 07695353 · doi:10.1007/s10851-022-01126-7
[9] J. Bilmes and C. Bartels, Graphical model architectures for speech recognition, IEEE Signal Processing Magazine, 22 (2005), 89-100. doi: 10.1109/MSP.2005.1511827. · doi:10.1109/MSP.2005.1511827
[10] J. Bilmes and G. Zweig, The graphical models toolkit: An open source software system for speech and time-series processing, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing, 4 (2002), 3916-3919.
[11] N. Boumal, An Introduction to Optimization on Smooth manifolds, Cambridge University Press, Cambridge, 2023. · Zbl 1532.90001
[12] N. Boumal, P.-A. Absil and C. Cartis, Global rates of convergence for nonconvex optimization on manifolds, IMA Journal of Numerical Analysis, 39 (2019), 1-33. doi: 10.1093/imanum/drx080. · Zbl 1483.65092 · doi:10.1093/imanum/drx080
[13] R. E. Caflisch, Valuation of morgage backed securities using Brownian bridges to reduce effective dimension, The Journal of Computational Finance, 1 (1997), 27-46.
[14] C.-H. Chang, R. Caruana and A. Goldenberg, Node-gam: Neural generalized additive model for interpretable deep learning, arXiv: 2106.01613, 2021.
[15] P. M. Cohn, Basic Algebra: Groups, Rings and Fields, Springer Science & Business Media, 2012. doi: 10.1090/conm/575/11387. · doi:10.1090/conm/575/11387
[16] J. Dick, F. Y. Kuo and I. H. Sloan, High-dimensional integration: The quasi-Monte Carlo way, Acta Numerica, 22 (2013), 133-288. doi: 10.1017/S0962492913000044. · Zbl 1296.65004 · doi:10.1017/S0962492913000044
[17] J. Enouen and Y. Liu, Sparse interaction additive networks via feature interaction detection and sparse selection, Advances in Neural Information Processing Systems, 35 (2022), 13908-13920.
[18] M. Griebel and M. Holtz, Dimension-wise integration of high-dimensional functions with applications to finance, Journal of Complexity, 26 (2010), 455-489. doi: 10.1016/j.jco.2010.06.001. · Zbl 1203.65056 · doi:10.1016/j.jco.2010.06.001
[19] M. Griebel, F. Y. Kuo and I. H. Sloan, The ANOVA decomposition of a non-smooth function of infinitely many variables can have every term smooth, Mathematics of Computations, 86 (2017), 1855-1876. doi: 10.1090/mcom/3171. · Zbl 1365.41021 · doi:10.1090/mcom/3171
[20] C. R. Harris, K. J. Millman, S. J. van der Walt, R. Gommers, P. Virtanen, D. Cournapeau, E. Wieser, J. Taylor, S. Berg, N. J. Smith, R. Kern, M. Picus, S. Hoyer, M. H. van Kerkwijk, M. Brett, A. Haldane, J. F. del Río, M. Wiebe, P. Peterson, P. Gérard-Marchant, K. Sheppard, T. Reddy, W. Weckesser, H. Abbasi, C. Gohlke and T. E. Oliphant, Array programming with NumPy, Nature, 585 (2020), 357-362. doi: 10.1038/s41586-020-2649-2. · doi:10.1038/s41586-020-2649-2
[21] M. Hasannasab, J. Hertrich, S. Neumayer, G. Plonka, S. Setzer and G. Steidl, Parseval proximal neural networks, Journal of Fourier Analysis and Applications, 26 (2020), Paper No. 59, 31 pp. doi: 10.1007/s00041-020-09761-7. · Zbl 1489.68224 · doi:10.1007/s00041-020-09761-7
[22] M. Hefter, K. Ritter and G. W. Wasilkowski, On equivalence of weighted anchored and ANOVA spaces of functions with mixed smoothness of order one in \(l_1\) or \(l_{\operatorname{\infty}} \), Journal of Complexity, 32 (2016), 1-19. doi: 10.1016/j.jco.2015.07.001. · Zbl 1342.46032 · doi:10.1016/j.jco.2015.07.001
[23] J. Hertrich, F. A. Ba and G. Steidl, Sparse mixture models inspired by ANOVA decompositions, Electronic Transactions on Numerical Analysis, 55 (2022), 142-168. doi: 10.1553/etna_vol55s142. · Zbl 1478.62166 · doi:10.1553/etna_vol55s142
[24] J. Hertrich, S. Neumayer and G. Steidl, Convolutional proximal neural networks and plug-and-play algorithms, Linear Algebra and its Applications, 631 (2021), 203-234. doi: 10.1016/j.laa.2021.09.004. · Zbl 1534.68203 · doi:10.1016/j.laa.2021.09.004
[25] F. Hickernell, I. Sloan and G. Wasilkowski, On tractability of weighted integration over bounded and unbounded regions in \(\mathbb{R}^s\), Mathematics of Computation, 73 (2004), 1885-1901. doi: 10.1090/S0025-5718-04-01624-2. · Zbl 1068.65005 · doi:10.1090/S0025-5718-04-01624-2
[26] F. J. Hickernell, I. H. Sloan and G. W. Wasilkowski, The strong tractability of multivariate integration using lattice rules, Monte Carlo and Quasi-Monte Carlo Methods 2002, Berlin, Heidelberg, Springer-Verlag, Berlin, 2004,259-273. · Zbl 1042.65004
[27] J. Hu, X. Liu, Z.-W. Wen and Y.-X. Yuan, A brief introduction to manifold optimization, Journal of the Operations Research Society of China, 8 (2020), 199-248. doi: 10.1007/s40305-020-00295-9. · Zbl 1474.49093 · doi:10.1007/s40305-020-00295-9
[28] A. Hurwitz, Über die Erzeugung der Invarianten durch Integration, Nachrichten von der Gesellschaft der Wissenschaften zu Göttingen, Mathematisch-Physikalische Klasse, 1897 (1897), 71-90. doi: 10.1007/978-3-0348-4160-3_38. · JFM 28.0103.03 · doi:10.1007/978-3-0348-4160-3_38
[29] M. G. Kapteyn, J. V. R. Pretorius and K. E. Willcox, A probabilistic graphical model foundation for enabling predictive digital twins at scale, Nature Computational Science, 1 (2021), 337-347.
[30] M. Kochurov, R. Karimov and S. Kozlukov, Geoopt: Riemannian optimization in pytorch, arXiv: 2005.02819, 2020.
[31] S. G. Krantz and H. R. Parks, A Primer of Real Analytic Functions, Birkhäuser Advanced Texts Basler Lehrbücher, Birkhäuser Boston, MA, 2nd ed. edition, 2002. doi: 10.1007/978-0-8176-8134-0. · Zbl 1015.26030 · doi:10.1007/978-0-8176-8134-0
[32] F. Kuo, D. Nuyens, L. Plaskota, I. Sloan and G. Wasilkowski, Infinite-dimensional integration and the multivariate decomposition method, Journal of Computational and Applied Mathematics, 326 (2017), 217-234. doi: 10.1016/j.cam.2017.05.031. · Zbl 1370.65013 · doi:10.1016/j.cam.2017.05.031
[33] F. Y. Kuo, I. H. Sloan, G. W. Wasilkowski and H. Wozniakowski, On decompositions of multivariate functions, Mathematics of Computation, 79 (2010), 953-966. doi: 10.1090/S0025-5718-09-02319-9. · Zbl 1196.41022 · doi:10.1090/S0025-5718-09-02319-9
[34] G. Li and T. K. Pong, Calculus of the exponent of Kurdyka-Łojasiewicz inequality and its applications to linear convergence of first-order methods, Foundations of Computational Mathematics, 18 (2018), 1199-1232. doi: 10.1007/s10208-017-9366-8. · Zbl 1405.90076 · doi:10.1007/s10208-017-9366-8
[35] T. Li, Y. Zhao, K. Yan, K. Zhou, C. Zhang and X. Zhang, Probabilistic graphical models in energy systems: A review, Building Simulation, 15 (2022), 699-728. doi: 10.1007/s12273-021-0849-9. · doi:10.1007/s12273-021-0849-9
[36] L. Lippert and D. Potts, Variable transformations in combination with wavelets and ANOVA for high-dimensional approximation, arXiv: 2207.12826, 2022.
[37] L. Lippert, D. Potts and T. Ullrich, Fast hyperbolic wavelet regression meets ANOVA, Numerische Mathematik, 154 (2023), 155-207. doi: 10.1007/s00211-023-01358-8. · Zbl 1528.41030 · doi:10.1007/s00211-023-01358-8
[38] T. Maehara and K. Murota, Algorithm for error-controlled simultaneous block-diagonalization of matrices, SIAM Journal on Matrix Analysis and Applications, 32 (2011), 605-620. doi: 10.1137/090779966. · Zbl 1227.65039 · doi:10.1137/090779966
[39] K. Murota, Y. Kanno, M. Kojima and S. Kojima, A numerical algorithm for block-diagonal decomposition of matrix \(\ast \)-algebras with application to semidefinite programming, Japan Journal of Industrial and Applied Mathematics, 27 (2010), 125-160. doi: 10.1007/s13160-010-0006-9. · Zbl 1204.65068 · doi:10.1007/s13160-010-0006-9
[40] F. Nestler, M. Stoll and T. Wagner, Learning in high-dimensional feature spaces using ANOVA-based fast matrix-vector multiplication, Foundations of Data Science, 4 (2022), 423-440. doi: 10.3934/fods.2022012. · Zbl 1496.65243 · doi:10.3934/fods.2022012
[41] T. J. O’Kane, D. Harries and M. A. Collier, Bayesian structure learning for climate model evaluation, Earth and Space Science Open Archive, 2023. doi: 10.22541/essoar.169603507.70356103/v1. · doi:10.22541/essoar.169603507.70356103/v1
[42] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai and S. Chintala, Pytorch: An imperative style, high-performance deep learning library, In Advances in Neural Information Processing Systems 32, Curran Associates, Inc., 2019, 8024-8035.
[43] D. Potts and M. Schmischke, Approximation of high-dimensional periodic functions with Fourier-based methods, SIAM Journal on Numerical Analysis, 59 (2021), 2393-2429. doi: 10.1137/20M1354921. · Zbl 1476.65022 · doi:10.1137/20M1354921
[44] D. Potts and M. Schmischke, Interpretable transformed ANOVA approximation on the example of the prevention of forest fires, Frontiers in Applied Mathematics and Statistics, 8 (2022), 795250. doi: 10.3389/fams.2022.795250. · doi:10.3389/fams.2022.795250
[45] Q. Rebjock and N. Boumal, Fast convergence to non-isolated minima: four equivalent conditions for \(C^2\) functions, arXiv: 2303.00096, 2023.
[46] R. Schneider and A. Uschmajew, Convergence results for projected line-search methods on varieties of low-rank matrices via Łojasiewicz inequality, SIAM Journal on Optimization, 25 (2015), 622-646. doi: 10.1137/140957822. · Zbl 1355.65079 · doi:10.1137/140957822
[47] H. H. Sohrab, Basic Real Analysis, Birkhäuser New York, NY, 2nd ed. edition, 2014. doi: 10.1007/978-1-4939-1841-6. · Zbl 1308.26006 · doi:10.1007/978-1-4939-1841-6
[48] S. Sullivant, Algebraic Statistics, Graduate Studies in Mathematics, 194, American Mathematical Society, Providence, Rhode Island, 2018. doi: 10.1090/gsm/194. · Zbl 1408.62004 · doi:10.1090/gsm/194
[49] Trilla-Fuertes et al., Multi-omics characterization of response to pd-1 inhibitors in advanced melanoma, Cancers, 15 (2023).
[50] K. Usevich, J. Li and P. Comon, Approximate matrix and tensor diagonalization by unitary transformations: Convergence of jacobi-type algorithms, SIAM Journal on Optimization, 30 (2020), 2998-3028. doi: 10.1137/19M125950X. · Zbl 1453.90168 · doi:10.1137/19M125950X
[51] M. Usvyatsov, R. Ballester-Ripoll and K. Schindler, tntorch: Tensor network learning with PyTorch, Journal of Machine Learning Research, 23 (2022), 1-6.
[52] S.-T. Yau, Non-existence of continuous convex functions on certain riemannian manifolds, Mathematische Annalen, 207 (1974), 269-270. doi: 10.1007/BF01351342. · Zbl 0261.53036 · doi:10.1007/BF01351342
[53] P. Yu, G. Li and T. K. Pong, Kurdyka-Łojasiewicz Exponent via Inf-projection, Foundations of Computational Mathematics, 22 (2022), 1171-1217. doi: 10.1007/s10208-021-09528-6. · Zbl 1501.90047 · doi:10.1007/s10208-021-09528-6
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.