Abstract
Krylov Subspace Methods (KSMs) are popular numerical tools for solving large linear systems of equations. We consider their role in solving sparse systems on future massively parallel distributed memory machines, by estimating future performance of their constituent operations. To this end we construct a model that is simple, but which takes topology and network acceleration into account as they are important considerations. We show that, as the number of nodes of a parallel machine increases to very large numbers, the increasing latency cost of reductions may well become a problematic bottleneck for traditional formulations of these methods. Finally, we discuss how pipelined KSMs can be used to tackle the potential problem, and appropriate pipeline depths.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Retrieved from MVAPICH2 website (2012), http://mvapich.cse.ohio-state.edu/performance/interNode.shtml (2011)
Adiga, N., et al.: An overview of the BlueGene/L supercomputer. In: ACM/IEEE 2002 Conference on Supercomputing, p. 60 (November 2002)
Ajima, Y., Sumimoto, S., Shimizu, T.: Tofu: A 6D mesh/torus interconnect for exascale computers. Computer 42(11), 36–40 (2009)
Arimilli, B., et al.: The PERCS high-performance interconnect. In: IEEE HOTI 2010, pp. 75–82 (August 2010)
Ashby, T.J., O’Boyle, M.: Iterative collective loop fusion. In: Mycroft, A., Zeller, A. (eds.) CC 2006. LNCS, vol. 3923, pp. 202–216. Springer, Heidelberg (2006)
Chen, D., Eisley, N.A., Heidelberger, P., Senger, R.M., Sugawara, Y., Kumar, S., Salapura, V., Satterfield, D.L., Steinmacher-Burow, B., Parker, J.J.: The IBM BlueGene/Q interconnection network and message unit. In: Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis. SC 2011, pp. 26–27. ACM, New York (2011)
Ghysels, P., Ashby, T.J., Meerbergen, K., Vanroose, W.: Hiding global communication latency in the GMRES algorithm on massively parallel machines (to be published, 2012)
Gmeiner, B., Gradl, T., Köstler, H., Rüde, U.: Analysis of a flat highly parallel geometric multigrid algorithm for hierarchical hybrid grids. Technical report, Dept. Comp. Sci., Universität Erlangen-Nürnberg (2011)
Hernández, V., Román, J.E., Tomás, A.: A parallel variant of the Gram-Schmidt process with reorthogonalization. In: PARCO, pp. 221–228 (2005)
Hoefler, T., Lumsdaine, A.: Overlapping communication and computation with high level communication routines. In: Proceedings of the 8th IEEE Symposium on Cluster Computing and the Grid (CCGrid 2008) (May 2008)
Hoefler, T., Schneider, T., Lumsdaine, A.: Characterizing the influence of system noise on large-scale applications by simulation. In: ACM/IEEE Supercomputing 2010, pp. 1–11 (2010)
Moody, A., Fernandez, J., Petrini, F., Panda, D.K.: Scalable NIC-based reduction on large-scale clusters. In: ACM/IEEE Supercomputing 2003, pages 59 (2003)
Stevens, R., White, A., et al.: Architectures and technology for extreme scale computing. Technical report, ASCR Scientic Grand Challenges Workshop Series (December 2009)
Tianruo Yang, L., Brent, R.: The improved Krylov subspace methods for large and sparse linear systems on bulk synchronous parallel architectures. In: IEEE IPDPS 2003, p. 11 (April 2003)
Udipi, A.N., Muralimanohar, N., Balasubramonian, R., Davis, A., Jouppi, N.P.: Combining memory and a controller with photonics through 3D-stacking to enable scalable and energy-efficient systems. In: ISCA 2011, pp. 425–436 (2011)
Woo, D.H., Seong, N.H., Lewis, D., Lee, H.-H.: An optimized 3D-stacked memory architecture by exploiting excessive, high-density TSV bandwidth. In: IEEE HPCA 2010, pp. 1–12 (January 2010)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ashby, T.J., Ghysels, P., Heirman, W., Vanroose, W. (2012). The Impact of Global Communication Latency at Extreme Scales on Krylov Methods. In: Xiang, Y., Stojmenovic, I., Apduhan, B.O., Wang, G., Nakano, K., Zomaya, A. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2012. Lecture Notes in Computer Science, vol 7439. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33078-0_31
Download citation
DOI: https://doi.org/10.1007/978-3-642-33078-0_31
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33077-3
Online ISBN: 978-3-642-33078-0
eBook Packages: Computer ScienceComputer Science (R0)