×

Communication lower bounds of bilinear algorithms for symmetric tensor contractions. (English) Zbl 1487.65051

Summary: We introduce a new theoretical framework for deriving lower bounds on data movement in bilinear algorithms. Bilinear algorithms are a general representation of fast algorithms for bilinear functions, which include computation of matrix multiplication, convolution, and symmetric tensor contractions. A bilinear algorithm is described by three matrices. Our communication lower bounds are based on quantifying the minimal matrix ranks of matching subsets of columns of these matrices. This infrastructure yields new lower bounds for symmetric tensor contraction algorithms, which provide new qualitative insights. Tensor symmetry (invariance under permutation of modes) is common to many applications of tensor computations (e.g., tensor representation of hypergraphs, analysis of high-order moments in data, as well as tensors modeling interactions of electrons in computational chemistry). Tensor symmetry enables reduction in representation size as well as contraction cost by factors that scale with the number of equivalent permutations. However, we derive lower bounds showing that these cost and memory reductions can necessitate increases in data movement by factors that scale with the size of the tensors.

MSC:

65F99 Numerical linear algebra
15A69 Multilinear algebra, tensor calculus
65Y05 Parallel numerical computation
68Q25 Analysis of algorithms and problem complexity

Software:

BLAS

References:

[1] A. Aggarwal, A. K. Chandra, and M. Snir, On communication latency in PRAM computations, in Proceedings of the First Annual ACM Symposium on Parallel Algorithms and Architectures, ACM, 1989, pp. 11-21.
[2] A. Alexandrov, M. F. Ionescu, K. E. Schauser, and C. Scheiman, LogGP: Incorporating long messages into the LogP model-one step closer towards a realistic model for parallel computation, in Proceedings of the Seventh Annual ACM Symposium on Parallel Algorithms and Architectures (SPAA ’95), Santa Barbara, CA, 1995, ACM, pp. 95-105.
[3] G. Ballard, J. Demmel, O. Holtz, and O. Schwartz, Minimizing communication in numerical linear algebra, SIAM J. Matrix Anal. Appl., 32 (2011), pp. 866-901, https://doi.org/10.1137/090769156. · Zbl 1246.68128
[4] G. Ballard, J. Demmel, O. Holtz, and O. Schwartz, Graph expansion and communication costs of fast matrix multiplication, J. ACM, 59 (2013), 32. · Zbl 1281.68241
[5] G. Ballard, N. Knight, and K. Rouse, Communication lower bounds for matricized tensor times Khatri-Rao product, in Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS), IEEE, 2018, pp. 557-567.
[6] R. J. Bartlett, Many-body perturbation theory and coupled cluster theory for electron correlation in molecules, 32 (1981), pp. 359-401.
[7] J. Bennett, A. Carbery, M. Christ, and T. Tao, The Brascamp-Lieb inequalities: Finiteness, structure and extremals, Geom. Funct. Anal., 17 (2008), pp. 1343-1415. · Zbl 1132.26006
[8] G. Bilardi and L. De Stefani, The I/O Complexity of Strassen’s Matrix Multiplication with Recomputation, preprint, https://arxiv.org/abs/1605.02224, 2016. · Zbl 1491.68274
[9] Y. Chen, L. Qi, and X. Zhang, The Fiedler vector of a Laplacian tensor for hypergraph partitioning, SIAM J. Sci. Comput., 39 (2017), pp. A2508-A2537, https://doi.org/10.1137/16M1094828. · Zbl 1375.05184
[10] M. Christ, J. Demmel, N. Knight, T. Scanlon, and K. Yelick, Communication Lower Bounds and Optimal Algorithms for Programs That Reference Arrays-Part 1, preprint, https://arxiv.org/abs/1308.0068, 2013.
[11] P. Comon, G. Golub, L.-H. Lim, and B. Mourrain, Symmetric tensors and symmetric tensor rank, SIAM J. Matrix Anal. Appl., 30 (2008), pp. 1254-1279, https://doi.org/10.1137/060661569. · Zbl 1181.15014
[12] D. Culler, R. Karp, D. Patterson, A. Sahay, K. E. Schauser, E. Santos, R. Subramonian, and T. von Eicken, LogP: Towards a realistic model of parallel computation, in Proceedings of the Fourth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPOPP ’93), San Diego, CA, 1993, ACM, pp. 1-12.
[13] J. Demmel, D. Eliahu, A. Fox, S. Kamil, B. Lipshitz, O. Schwartz, and O. Spillinger, Communication-optimal parallel recursive rectangular matrix multiplication, in Proceedings of the 2013 IEEE 27th International Symposium on Parallel and Distributed Processing (IPDPS), IEEE, 2013, pp. 261-272.
[14] E. Di Napoli, D. Fabregat-Traver, G. Quintana-Ortí, and P. Bientinesi, Towards an efficient use of the BLAS library for multilinear tensor contractions, Appl. Math. Comput., 235 (2014), pp. 454-468. · Zbl 1336.65076
[15] E. Epifanovsky, M. Wormit, T. Kus, A. Landau, D. Zuev, K. Khistyaev, P. Manohar, I. Kaliman, A. Dreuw, and A. I. Krylov, New implementation of high-level correlated methods using a general block tensor library for high-performance electronic structure calculations, J. Comput. Chem., 34 (2013), pp. 2293-2309.
[16] O. Hölder, Über einen Mittelwertsatz, Nachr. Ges. Wiss. Göttingen, (1889), pp. 38-47. · JFM 21.0260.07
[17] J.-W. Hong and H. T. Kung, I/O complexity: The red-blue pebble game, in Proceedings of the Thirteenth Annual ACM Symposium on Theory of Computing (STOC ’81), Milwaukee, WI, 1981, ACM, pp. 326-333.
[18] D. Irony, S. Toledo, and A. Tiskin, Communication lower bounds for distributed-memory matrix multiplication, J. Parallel Distrib. Comput., 64 (2004), pp. 1017-1026. · Zbl 1114.68081
[19] T. G. Kolda and B. W. Bader, Tensor decompositions and applications, SIAM Rev., 51 (2009), pp. 455-500, https://doi.org/10.1137/07070111X. · Zbl 1173.65029
[20] C. L. Lawson, R. J. Hanson, D. R. Kincaid, and F. T. Krogh, Basic linear algebra subprograms for Fortran usage, ACM Trans. Math. Software (TOMS), 5 (1979), pp. 308-323. · Zbl 0412.65022
[21] L. H. Loomis and H. Whitney, An inequality related to the isoperimetric inequality, Bull. AMS, 55 (1949), pp. 961-962. · Zbl 0035.38302
[22] J. Noga and P. Valiron, Improved algorithm for triple-excitation contributions within the coupled cluster approach, Molecular Physics, 103 (2005), pp. 2123-2130.
[23] V. Pan, How can we speed up matrix multiplication?, SIAM Rev., 26 (1984), pp. 393-415, https://doi.org/10.1137/1026076. · Zbl 0563.65028
[24] S. Rajbhandari, A. Nikam, P.-W. Lai, K. Stock, S. Krishnamoorthy, and P. Sadayappan, A communication-optimal framework for contracting distributed tensors, in Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, (SC ’14), New Orleans, LA, IEEE, 2014, pp. 375-386.
[25] J. N. Scott, An I/O-Complexity Lower Bound for All Recursive Matrix Multiplication Algorithms by Path-Routing, Ph.D. thesis, University of California, Berkeley, 2015.
[26] S. Sherman and T. G. Kolda, Estimating higher-order moments using symmetric tensor decomposition, SIAM J. Matrix Anal. Appl., 41 (2020), pp. 1369-1387, https://doi.org/10.1137/19M1299633. · Zbl 1467.15022
[27] E. Solomonik, Provably Efficient Algorithms for Numerical Tensor Algebra, Ph.D. thesis, University of California, Berkeley, 2014.
[28] E. Solomonik, E. Carson, N. Knight, and J. Demmel, Tradeoffs between synchronization, communication, and computation in parallel linear algebra computations, ACM Trans. Parallel Comput. 3 (2016), 3.
[29] E. Solomonik and J. Demmel, Contracting Symmetric Tensors Using Fewer Multiplications, Tech. report, ETH Zürich, 2015.
[30] E. Solomonik and J. Demmel, Fast bilinear algorithms for symmetric tensor contractions, Comput. Methods Appl. Math., 21 (2021), pp. 211-231. · Zbl 1487.65051
[31] E. Solomonik, D. Matthews, J. Hammond, and J. Demmel, Cyclops tensor framework: Reducing communication and eliminating load imbalance in massively parallel contractions, in Proceedings of the 2013 IEEE 27th International Symposium on Parallel Distributed Processing (IPDPS), Cambridge, MA, 2013, pp. 813-824.
[32] E. Solomonik, D. Matthews, J. R. Hammond, J. F. Stanton, and J. Demmel, A massively parallel tensor contraction framework for coupled-cluster computations, J. Parallel Distrib. Comput., 74 (2014), pp. 3176 – 3190.
[33] V. Strassen, Gaussian elimination is not optimal, Numer. Math., 13 (1969), pp. 354-356. · Zbl 0185.40101
[34] A. Tiskin, The Design and Analysis of Bulk-Synchronous Parallel Algorithms, Ph.D. thesis, University of Oxford, 1998. · Zbl 0902.68072
[35] L. G. Valiant, A bridging model for parallel computation, Comm. ACM, 33 (1990), pp. 103-111.
[36] J. Čížek, On the correlation problem in atomic and molecular systems: Calculation of wavefunction components in Ursell-type expansion using quantum-field theoretical methods, J. Chem. Phys., 45 (1966), pp. 4256-4266.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.