×

On iterative QR pre-processing in the parallel block-Jacobi SVD algorithm. (English) Zbl 1204.68259

Summary: An efficient version of the parallel two-sided block-Jacobi algorithm for the singular value decomposition of an \(m\times n\) matrix \(A\) includes the pre-processing step, which consists of the QR factorization of \(A\) with column pivoting followed by the optional LQ factorization of the \(R\)-factor. Then the iterative two-sided block-Jacobi algorithm is applied in parallel to the \(R\)-factor (or \(L\)-factor). For the efficient computation of the parallel QR (or LQ) factorization with (or without) column pivoting implemented in the ScaLAPACK, some matrix block cyclic distribution on a process grid \(r\times c\) with \(p = r \times c, r, c \geqslant 1\), and block size \(n_b \times n_b\) is required so that all processors remain busy during the whole parallel QR (or LQ) factorization. Optimal values for parameters \(r, c\) and \(n_b\) are estimated experimentally using matrices of order \(n=4000\) and 8000, and the number of processors \(p=8\) and 16, respectively. It turns out that the optimal values are about \(n_b = 100\) and \(r \leqslant c\) with both \(r, c\) near to \(\sqrt{p}\). These parameters are then used in numerical experiments for six various distributions of singular values combined with well- \((\kappa =10^{1})\) and ill-conditioned matrices \((\kappa =10^{8})\).
It is shown that using optimal parameters in the pre-processing step, the parallel two-sided block-Jacobi SVD algorithm performs better (or equally well) than the ScaLAPACK routine PDGESVD for matrices with a multiple minimal/maximal singular value regardless to the condition number. For other distributions of singular values, our algorithm is slower than the ScaLAPACK. The un-pivoted QRLQ pre-processing step is then re-formulated and extended to the QR iteration, and its connection to the QR algorithm applied to specific symmetric, positive definite matrices is shown. This connection helps to explain observations in another set of experiments with a variable number of QR iteration steps. In general, the best results for all six distributions of singular values are achieved by using about six QR iteration steps in pre-processing.

MSC:

68W10 Parallel algorithms in computer science
68W99 Algorithms in computer science

Software:

ScaLAPACK
Full Text: DOI

References:

[1] Bečka, M.; Okša, G.; Vajteršic, M.: Dynamic ordering for a parallel block-Jacobi SVD algorithm, Parallel comput. 28, 243-262 (2002) · Zbl 0983.68251
[2] Chan, T. F.: Rank revealing QR factorizations, Linear algebra appl. 88/89, 67-82 (1987) · Zbl 0624.65025
[3] Chandrasekaran, S.; Ipsen, I. C. F.: Analysis pf a QR algorithm for computing singular values, SIAM J. Matrix anal. Appl. 16, No. 2, 520-535 (1995) · Zbl 0827.65040
[4] Choi, J.; Dongarra, J. J.; Ostrouchov, L. S.; Petitet, A. P.; Walker, D. W.; Whaley, R. C.: The design and implementation of the scalapack Lu, QR and Cholesky factorization routines, Sci. program. 5, 173-184 (1996)
[5] Hong, Y. P.; Pan, C. -T.: Rank-revealing QR factorizations and the singular value decomposition, Math. comput. 58, 213-232 (1992) · Zbl 0743.65037 · doi:10.2307/2153029
[6] Huckaby, D. A.; Chan, T. F.: On the convergence of stewart’s QLP algorithm for approximating the SVD, Numer. algorithms 32, 287-316 (2003) · Zbl 1035.65034 · doi:10.1023/A:1024082314087
[7] Okša, G.; Vajteršic, M.: Efficient pre-processing in the parallel block-Jacobi SVD algorithm, Parallel comput. 32, 166-176 (2006)
[8] Stewart, G. W.: The QLP approximation to the singular value decomposition, SIAM J. Sci. comput. 20, 1336-1348 (1999) · Zbl 0939.65062 · doi:10.1137/S1064827597319519
[9] Watkins, D. S.: The matrix eigenvalue problem, (2007) · Zbl 1142.65038
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.