×

Parallelizing Strassen’s method for matrix multiplication on distributed-memory MIMD architectures. (English) Zbl 0839.68093

Summary: We present a parallel method for matrix multiplication on distributed-memory MIMD architectures based on Strassen’s method. Our timing tests, performed on a 56-node Intel Paragon, demonstrate the realization of the potential of the Strassen’s method with a complexity of \(4.7M^{2.807}\) at the system level rather than the node level at which several earlier works have been focused. The parallel efficiency is nearly perfect when the processor number is the power of 7. The parallelized Strassen’s method seems always faster than the traditional matrix multiplication methods whose complexity is \(2M^3\) coupled with the BMR method and the Ring method at the system level. The speed gain depends on matrix order \(M : 20 \%\) for \(M \approx 1000\) and more than \(100\%\) for \(M \approx 5000\).

MSC:

68T20 Problem solving in the context of artificial intelligence (heuristics, search strategies, etc.)
68M99 Computer system organization

Software:

PUMMA; CLASSPACK
Full Text: DOI

References:

[1] Golub, G. H.; Van Loan, C. F., Matrix Computations (1989), John Hopkins University Press: John Hopkins University Press Baltimore, MD · Zbl 0733.65016
[2] Pan, V., How can we speed up matrix multiplication?, SIAM Reviews, 26, 393-415 (1984) · Zbl 0563.65028
[3] Strassen, V., Gaussian elimination is not optimal, Numer. Math., 13, 354-356 (1969) · Zbl 0185.40101
[4] Coppersmith, D.; Winograd, S., Matrix multiplication via arithmetic progressions, (Proceeding of the Nineteenth Annual ACM Symposium on Theory of Computing (1987)), 1-6
[5] Winograd, S., Some remarks on fast multiplication of polynomials, (Traub, J., Complexity of Sequential and Parallel Numerical Algorithms (1973), Academic Press), 181-196 · Zbl 0273.65033
[6] Yau, S.-T.; Lu, Y. Y., Reducing the symmetric matrix eigenvalue problem to matrix multiplications, SIAM Journal on Scientific Computing, 14, 121-144 (1993) · Zbl 0771.65019
[7] Gallivan, K. A.; Plemmons, R. J.; Sameh, A. H., Parallel algorithms for dense linear algebra computations, SIAM Rev., 32, 54-135 (1990) · Zbl 0697.65010
[8] Manber, U., Introduction to Algorithms (1989), Addison-Wesley: Addison-Wesley Reading, MA · Zbl 0825.68397
[9] Bailey, D., Extra high speed matrix multiplication on the CRAY-2, SIAM J. Sci. Statist. Comput., 9, 603-607 (1988) · Zbl 0644.65030
[10] Bjørstad, P.; Manne, F.; Sørevik, T.; Vajteršic, M., Efficient matrix multiplication on SIMD computers, SIAM J. Anal. Appl., 13, 386-401 (1992) · Zbl 0757.65050
[11] D. Scott, Private communication, Supercomputer Systems Division of Intel Corporation, Beaverton, OR, (March 1994).; D. Scott, Private communication, Supercomputer Systems Division of Intel Corporation, Beaverton, OR, (March 1994).
[12] Choi, J.; Dongarra, J. J.; Walker, D. W., PUMMA: Parallel Universal Matrix Multiplication Algorithms on distributed memory concurrent computers, (Technical Report TM-12252 (August 1993), Oak Ridge National Laboratory) · Zbl 0874.68129
[13] Fox, G. C.; Hey, A. I.; Otto, S., Matrix algorithms on the hypercube I: Matrix multiplication, Parallel Computing, 4, 17 (1987) · Zbl 0632.65043
[14] Huss-Lederman, S.; Tsao, A.; Jacobson, E. M.; Zhang, G., Matrix multiplication on Intel Touchstone Delta, (Technical Report: TR-93-101 (February 1994), SRC)
[15] Kuck and Associates, CLASSPACK Basic Math Library User’s Guide (December 1992), Kuck & Associates: Kuck & Associates Champaign, IL
[16] Fox, G. C.; Johnson, M. A.; Lyzenga, G.; Otto, S. W.; Salmon, J.; Walker, D., (Solving Problems on Concurrent Processors, Vol. 1 (1988), General Techniques and Regular Problems, Prentice-Hall: General Techniques and Regular Problems, Prentice-Hall Englewood Cliffs, NJ)
[17] Higham, N. J., Exploiting fast matrix multiplication within the level 3 BLAS, ACM Trans. Math Software, 16, 112-115 (1990) · Zbl 0900.65118
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.