×

On optimization of finite-difference time-domain (FDTD) computation on heterogeneous and GPU clusters. (English) Zbl 1219.68055

Summary: A model for the computational cost of the finite-difference time-domain (FDTD) method irrespective of implementation details or the application domain is given. The model is used to formalize the problem of optimal distribution of computational load to an arbitrary set of resources across a heterogeneous cluster. We show that the problem can be formulated as a minimax optimization problem and derive analytic lower bounds for the computational cost. The work provides insight into optimal design of FDTD parallel software. Our formulation of the load distribution problem takes simultaneously into account the computational and communication costs. We demonstrate that significant performance gains, as much as 75%, can be achieved by proper load distribution.

MSC:

68M14 Distributed systems
65N06 Finite difference methods for boundary value problems involving PDEs

Software:

SQPlab; CUDA; PLCP
Full Text: DOI

References:

[1] S. Adams, J. Payne, R. Boppana, Finite difference time domain (FDTD) simulations using graphics processors, in: High Performance Computing Modernization Program Users Group Conference, 2007, pp. 334–338.
[2] Beaumont, O.; Boudet, V.; Rastello, F.; Robert, Y.: Matrix multiplication on heterogeneous platforms, IEEE transactions on parallel and distributed systems 12, No. 10, 1033-1051 (2001)
[3] Berenger, J.: A perfectly matched layer for the absorption of electromagnetic waves, Journal of computational physics 114, No. 2, 185-200 (1994) · Zbl 0814.65129 · doi:10.1006/jcph.1994.1159
[4] Bonnans, J. F.; Gilbert, J. C.; Lemaréchal, C.; Sagastizábal, C. A.: Numerical optimization: theoretical and practical aspects, (2006) · Zbl 1108.65060
[5] W. Chen, P. Kosmas, M. Leeser, C. Rappaport, An FPGA implementation of the two-dimensional finite-difference time-domain (FDTD) algorithm, in: Proc. International Symposium on Field Programmable Gate Arrays, 2004, pp. 213–222.
[6] P.E. Crandall, M.J. Quinn, Block data decomposition for data-parallel programming on a heterogeneous workstation network, in: Proc. Int. Symp. on High Performance Distributed Computing, 1993, pp. 42–49.
[7] Compute Unified Device Architecture (CUDA) Programming Guide, version 2.2. NVIDIA, 2009. http://developer.nvidia.com/object/cuda.html.
[8] Dongarra, J.; Lastovetsky, A.: An overview of heterogeneous high performance and grid computing, Engineering the grid (2006)
[9] J.P. Durbano, J.R. Humphrey, F.E. Ortiz, P.F. Curt, D.W. Prather, M.S. Mirotznik, Hardware acceleration of the 3D finite-difference time-domain method, in: Proc. IEEE Antennas and Propagation Society Int. Symposium, vol. 1, 2004, pp. 77–80.
[10] Gedney, S. D.: An anisotropic perfectly matched layer-absorbing medium for the truncation of FDTD lattices, IEEE transactions on antennas and propagation 44, No. 12, 1630-1639 (1996)
[11] Gropp, W.; Lusk, E.; Skjellum, A.: Using MPI: portable parallel programming with the message passing interface, (1999) · Zbl 0875.68206
[12] Guiffaut, C.; Mahdjoubi, K.: A parallel FDTD algorithm using the MPI library, IEEE antennas and propagation magazine 43, No. 2, 94-103 (2001)
[13] M.C. Hughes, M.A. Stuchly, Hybrid parallel finite difference time domain simulation of nanoscale optical phenomena, in: Int. Conf. on Wireless Communications and Applied Computational Electromagnetics, 2005, pp. 132–135.
[14] Kaddoura, M.; Ranka, S.; Wang, A.: Array decompositions for nonuniform computational environments, Journal of parallel and distributed computing 36, No. 2, 91-105 (1996)
[15] N. Karmarker, R.M. Karp, The differencing method of set partitioning, Technical Report, University of California at Berkeley, Berkeley, CA, USA, 1983.
[16] Kawaguchi, H.; Takahara, K.; Yamauchi, D.: Design study of ultrahigh-speed microwave simulator engine, IEEE transactions on magnetics 38, No. 2, 689-692 (2002)
[17] Korf, R.: A complete anytime algorithm for number partitioning, Artificial intelligence 106, No. 2, 181-203 (1998) · Zbl 0910.68084 · doi:10.1016/S0004-3702(98)00086-1
[18] S.E. Krakiwsky, L.E. Turner, M.M. Okoniewski, Acceleration of finite-difference time-domain (FDTD) using graphics processor units (GPU), in: IEEE Int. Microwave Symposium, vol. 2, 2004, pp. 1033–1036.
[19] A. Lastovetsky, On grid-based matrix partitioning for heterogeneous processors, in: Proc. Int. Symp. on Parallel and Distributed Computing, ISPDC, 2007, pp. 383–390.
[20] Lastovetsky, A. L.; Dongarra, J.: High performance heterogeneous computing, (2009)
[21] D. Liuge, L. Kang, K. Fanmin, Parallel 3D finite difference time domain simulations on graphics processors with CUDA, in: Int. Conf. on Computational Intelligence and Software Engineering, December 2009, pp. 1–4.
[22] Mertens, S.: The easiest hard problem: number partitioning, Computational complexity and statistical physics, 125-139 (2006) · Zbl 1156.82321
[23] Mur, G.: Absorbing boundary conditions for the finite-difference approximation of the time-domain electromagnetic-field equations, IEEE transactions on electromagnetic compatibility 23, No. 4, 377-382 (1981)
[24] OpenMP Application Programming Interface, version 3.0. 2009. OpenMP: http://openmp.org/wp/openmp-specifications/.
[25] Pinton, G. F.; Dahl, J.; Rosenzweig, S.; Trahey, G. E.: A heterogeneous nonlinear attenuating full-wave model of ultrasound, IEEE transactions on ultrasonics, ferroelectrics and frequency control 56, No. 3, 474-488 (2009)
[26] Roden, J. A.; Gedney, S. D.: Convolutional PML (CPML): an efficient FDTD implementation of the CFS-PML for arbitrary media, Microwave and optical technology letters 27, No. 5, 334-339 (2000)
[27] Sacks, Z. S.; Kingsland, D. M.; Lee, R.; Lee, J. F.: A perfectly matched anisotropic absorber for use as an absorbing boundary condition, IEEE transactions on antennas and propagation 43, No. 12, 1460-1463 (1995)
[28] T.P. Stefanski, T.D. Drysdale, Acceleration of the 3D ADI-FDTD method using graphics processor units, in: IEEE Int. Microwave Symposium, 2009, pp. 241–244.
[29] Taflove, A.; Hagness, S. C.: Computational electrodynamics: the finite difference time domain method, (2005) · Zbl 0963.78001
[30] N. Takada, T. Shimobaba, N. Masuda, T. Ito, High-speed FDTD simulation algorithm for GPU with compute unified device architecture, in: Proc. IEEE Antennas and Propagation Society Int. Symposium, 2009, pp. 1–4.
[31] Yee, K.: Numerical solution of initial boundary value problems involving Maxwell’s equations in isotropic media, IEEE transactions on antennas and propagation 14, 302-307 (1966) · Zbl 1155.78304 · doi:10.1109/TAP.1966.1138693
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.