The Impact of Global Communication Latency at Extreme Scales on Krylov Methods

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7439))

Included in the following conference series:

International Conference on Algorithms and Architectures for Parallel Processing

1950 Accesses
4 Citations

Abstract

Krylov Subspace Methods (KSMs) are popular numerical tools for solving large linear systems of equations. We consider their role in solving sparse systems on future massively parallel distributed memory machines, by estimating future performance of their constituent operations. To this end we construct a model that is simple, but which takes topology and network acceleration into account as they are important considerations. We show that, as the number of nodes of a parallel machine increases to very large numbers, the increasing latency cost of reductions may well become a problematic bottleneck for traditional formulations of these methods. Finally, we discuss how pipelined KSMs can be used to tackle the potential problem, and appropriate pipeline depths.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Adaptively restarted block Krylov subspace methods with low-synchronization skeletons

Article Open access 28 December 2022

How to Generate Effective Block Jacobi Preconditioners for Solving Large Sparse Linear Systems

Iterative Sparse Triangular Solves for Preconditioning

References

Retrieved from MVAPICH2 website (2012), http://mvapich.cse.ohio-state.edu/performance/interNode.shtml (2011)
Adiga, N., et al.: An overview of the BlueGene/L supercomputer. In: ACM/IEEE 2002 Conference on Supercomputing, p. 60 (November 2002)
Google Scholar
Ajima, Y., Sumimoto, S., Shimizu, T.: Tofu: A 6D mesh/torus interconnect for exascale computers. Computer 42(11), 36–40 (2009)
Article Google Scholar
Arimilli, B., et al.: The PERCS high-performance interconnect. In: IEEE HOTI 2010, pp. 75–82 (August 2010)
Google Scholar
Ashby, T.J., O’Boyle, M.: Iterative collective loop fusion. In: Mycroft, A., Zeller, A. (eds.) CC 2006. LNCS, vol. 3923, pp. 202–216. Springer, Heidelberg (2006)
Chapter Google Scholar
Chen, D., Eisley, N.A., Heidelberger, P., Senger, R.M., Sugawara, Y., Kumar, S., Salapura, V., Satterfield, D.L., Steinmacher-Burow, B., Parker, J.J.: The IBM BlueGene/Q interconnection network and message unit. In: Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis. SC 2011, pp. 26–27. ACM, New York (2011)
Google Scholar
Ghysels, P., Ashby, T.J., Meerbergen, K., Vanroose, W.: Hiding global communication latency in the GMRES algorithm on massively parallel machines (to be published, 2012)
Google Scholar
Gmeiner, B., Gradl, T., Köstler, H., Rüde, U.: Analysis of a flat highly parallel geometric multigrid algorithm for hierarchical hybrid grids. Technical report, Dept. Comp. Sci., Universität Erlangen-Nürnberg (2011)
Google Scholar
Hernández, V., Román, J.E., Tomás, A.: A parallel variant of the Gram-Schmidt process with reorthogonalization. In: PARCO, pp. 221–228 (2005)
Google Scholar
Hoefler, T., Lumsdaine, A.: Overlapping communication and computation with high level communication routines. In: Proceedings of the 8th IEEE Symposium on Cluster Computing and the Grid (CCGrid 2008) (May 2008)
Google Scholar
Hoefler, T., Schneider, T., Lumsdaine, A.: Characterizing the influence of system noise on large-scale applications by simulation. In: ACM/IEEE Supercomputing 2010, pp. 1–11 (2010)
Google Scholar
Moody, A., Fernandez, J., Petrini, F., Panda, D.K.: Scalable NIC-based reduction on large-scale clusters. In: ACM/IEEE Supercomputing 2003, pages 59 (2003)
Google Scholar
Stevens, R., White, A., et al.: Architectures and technology for extreme scale computing. Technical report, ASCR Scientic Grand Challenges Workshop Series (December 2009)
Google Scholar
Tianruo Yang, L., Brent, R.: The improved Krylov subspace methods for large and sparse linear systems on bulk synchronous parallel architectures. In: IEEE IPDPS 2003, p. 11 (April 2003)
Google Scholar
Udipi, A.N., Muralimanohar, N., Balasubramonian, R., Davis, A., Jouppi, N.P.: Combining memory and a controller with photonics through 3D-stacking to enable scalable and energy-efficient systems. In: ISCA 2011, pp. 425–436 (2011)
Google Scholar
Woo, D.H., Seong, N.H., Lewis, D., Lee, H.-H.: An optimized 3D-stacked memory architecture by exploiting excessive, high-density TSV bandwidth. In: IEEE HPCA 2010, pp. 1–12 (January 2010)
Google Scholar

Download references

Author information

Authors and Affiliations

Intel/Flanders Exascience Lab, Leuven, Belgium
Thomas J. Ashby, Pieter Ghysels & Wim Heirman
Imec, Leuven, Belgium
Thomas J. Ashby
Universiteit Antwerpen, Antwerp, Belgium
Pieter Ghysels & Wim Vanroose
Universiteit Gent, Ghent, Belgium
Wim Heirman

Authors

Thomas J. Ashby
View author publications
You can also search for this author in PubMed Google Scholar
Pieter Ghysels
View author publications
You can also search for this author in PubMed Google Scholar
Wim Heirman
View author publications
You can also search for this author in PubMed Google Scholar
Wim Vanroose
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Information Technology, Deakin University, Melbourne Burwood Campus, 221 Burwood Highway, 3125, Burwood, VIC, Australia
Yang Xiang
SEECS, University of Ottawa, 8, King Edward Ave, K1N 6N5, Ottawa, ON, Canada
Ivan Stojmenovic
Department of Intelligent Informatics, Kyushu Sangyo University, 2-3-1 Matsukadai, Higashi-ku, 813-8503, Fukuoka, Japan
Bernady O. Apduhan
School of Information Science and Engineering, Central South University, 410083, Changsha, Hunan Province, P.R. China
Guojun Wang
Department of Information Engineering, Hiroshima University, 1-4-1, Kagamiyama, 739-8527, Higashi-Hiroshima, Japan
Koji Nakano
School of Information Technologies, University of Sydney, Building J12, 2006, Sydney, NSW, Australia
Albert Zomaya

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ashby, T.J., Ghysels, P., Heirman, W., Vanroose, W. (2012). The Impact of Global Communication Latency at Extreme Scales on Krylov Methods. In: Xiang, Y., Stojmenovic, I., Apduhan, B.O., Wang, G., Nakano, K., Zomaya, A. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2012. Lecture Notes in Computer Science, vol 7439. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33078-0_31

Download citation

DOI: https://doi.org/10.1007/978-3-642-33078-0_31
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33077-3
Online ISBN: 978-3-642-33078-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

The Impact of Global Communication Latency at Extreme Scales on Krylov Methods

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Adaptively restarted block Krylov subspace methods with low-synchronization skeletons

How to Generate Effective Block Jacobi Preconditioners for Solving Large Sparse Linear Systems

Iterative Sparse Triangular Solves for Preconditioning

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

The Impact of Global Communication Latency at Extreme Scales on Krylov Methods

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Adaptively restarted block Krylov subspace methods with low-synchronization skeletons

How to Generate Effective Block Jacobi Preconditioners for Solving Large Sparse Linear Systems

Iterative Sparse Triangular Solves for Preconditioning

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation