×

Algorithms for efficient reproducible floating point summation. (English) Zbl 1484.65100


MSC:

65G50 Roundoff error
65B10 Numerical summation of series

Software:

CUBLAS; ExBLAS; MKL
Full Text: DOI

References:

[1] IEEE. 2008. IEEE standard for floating-point arithmetic. IEEE Std 754-2008 (Aug. 2008), 1-70. DOI:https://doi.org/10.1109/IEEESTD.2008.4610935
[2] Intel. 2018. Developer Reference for Intel® Math Kernel Library 2018 - C | Intel® Software. Retrieved from https://software.intel.com/en-us/download/developer-reference-for-intel-math-kernel-library-2018-c.
[3] NVIDIA. 2018. NVIDIA® cuBLAS. Retrieved from http://docs.nvidia.com/cuda/cublas/index.html.
[4] Intel. 2019. bfloat16 - HardwareNumerics Definition. Retrieved from https://software.intel.com/sites/default/files/managed/40/8b/bf16-hardware-numerics-definition-white-paper.pdf.
[5] IEEE. 2019. IEEE standard for floating-point arithmetic. IEEE Std 754-2019 (July 2019), 1-84. DOI:https://doi.org/10.1109/IEEESTD.2019.8766229
[6] R. Alverson. 1991. Integer division using reciprocals. In Proceedings of the Symposium on Computer Arithmetic (ARITH’91). 186-190. DOI:https://doi.org/10.1109/ARITH.1991.145558
[7] A. Arteaga, O. Fuhrer, and T. Hoefler. 2014. Designing bit-reproducible portable high-performance applications. In Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS’14). 1235-1244. DOI:https://doi.org/10.1109/IPDPS.2014.127
[8] C. Chohra, P. Langlois, and D. Parello. 2015. Efficiency of reproducible level 1 BLAS. In Proceedings of the Conference on Scientific Computing, Computer Arithmetic, and Validated Numerics (SCAN’15). Springer, Cham, 99-108. DOI:https://doi.org/10.1007/978-3-319-31769-4_8 · Zbl 1354.65091
[9] C. Chohra, P. Langlois, and D. Parello. 2016. Reproducible, accurately rounded and efficient BLAS. In Proceedings of the Euro-Par Parallel Processing Workshops. Springer, Cham, 609-620. DOI:https://doi.org/10.1007/978-3-319-58943-5_49 · Zbl 1354.65091
[10] S. Collange, D. Defour, S. Graillat, and R. Iakymchuk. 2015. Numerical reproducibility for the parallel reduction on multi- and many-core architectures. Parallel Comput. 49 (Nov. 2015), 83-97. DOI:https://doi.org/10.1016/j.parco.2015.09.001
[11] T. J. Dekker. 1971. A floating-point technique for extending the available precision. Numer. Math. 18, 3 (June 1971), 224-242. DOI:https://doi.org/10.1007/BF01397083 · Zbl 0226.65034
[12] J. Demmel, G. Gopalakrishnan, M. Heroux, W. Keyrouz, and K. Sato. 2015. Reproducibility of high performance codes and simulations: Tools, techniques, debugging. In Proceedings of the SC 2015 Birds of a Feather Sessions. Retrieved from https://gcl.cis.udel.edu/sc15bof.php.
[13] J. Demmel and Y. Hida. 2004. Accurate and efficient floating point summation. SIAM J. Sci. Comput. 25, 4 (Jan. 2004), 1214-1248. DOI:https://doi.org/10.1137/S1064827502407627 · Zbl 1061.65039
[14] J. Demmel and H. D. Nguyen. 2013. Fast reproducible floating-point summation. In Proceedings of the Symposium on Computer Arithmetic (ARITH’13). 163-172. DOI:https://doi.org/10.1109/ARITH.2013.9
[15] J. Demmel and H. D. Nguyen. 2015. Parallel reproducible summation. IEEE Trans. Comput. 64, 7 (July 2015), 2060-2070. DOI:https://doi.org/10.1109/TC.2014.2345391 · Zbl 1360.68042
[16] T. Granlund and P. L. Montgomery. 1994. Division by invariant integers using multiplication. In Proceedings of the Conference on Programming Language Design and Implementation (PLDI’94). 61-72. DOI:https://doi.org/10.1145/178243.178249
[17] Y. Hida, X. S. Li, and D. H. Bailey. 2001. Algorithms for quad-double precision floating point arithmetic. In Proceedings of the Symposium on Computer Arithmetic (ARITH’01). 155-162. DOI:https://doi.org/10.1109/ARITH.2001.930115
[18] N. Higham. 1993. The accuracy of floating point summation. SIAM J. Sci. Comput. 14, 4 (July 1993), 783-799. DOI:https://doi.org/10.1137/0914050 · Zbl 0788.65053
[19] N. Higham. 2002. Accuracy and Stability of Numerical Algorithms (2nd ed.). Society for Industrial and Applied Mathematics. DOI:https://doi.org/10.1137/1.9780898718027 · Zbl 1011.65010
[20] D. G. Hough. 2019. The IEEE standard 754: One for the history books. Computer 52, 12 (Dec. 2019), 109-112. DOI:https://doi.org/10.1109/MC.2019.2926614
[21] R. Iakymchuk, S. Collange, D. Defour, and S. Graillat. 2015. ExBLAS: Reproducible and accurate BLAS library. In Proceedings of the SC 2015 Numerical Reproducibility at Exascale Workshops (NRE’15). Retrieved from https://hal.archives-ouvertes.fr/hal-01202396.
[22] R. Iakymchuk, D. Defour, S. Collange, and S. Graillat. 2015. Reproducible and accurate matrix multiplication. In Proceedings of the Conference on Scientific Computing, Computer Arithmetic, and Validated Numerics (SCAN’15). Springer, Cham, 126-137. DOI:https://doi.org/10.1007/978-3-319-31769-4_11 · Zbl 1354.65082
[23] R. Iakymchuk, D. Defour, S. Collange, and S. Graillat. 2015. Reproducible triangular solvers for high-performance computing. In Proceedings of the International Conference on Information Technology - New Generations (ITNG’15). 353-358. DOI:https://doi.org/10.1109/ITNG.2015.63
[24] W. Kahan. 1965. Pracniques: Further remarks on reducing truncation errors. Commun. ACM 8, 1 (Jan. 1965). DOI:https://doi.org/10.1145/363707.363723
[25] D. E. Knuth. 1969. The Art of Computer Programming 2: Seminumerical Algorithms. Addison-Wesley, Reading, MA. · Zbl 0191.18001
[26] U. Kulisch. 2012. Computer Arithmetic and Validity: Theory, Implementation, and Applications (2nd ed.). Walter de Gruyter. · Zbl 1277.65028
[27] D. R. Lutz and C. N. Hinds. 2017. High-precision anchored accumulators for reproducible floating-point summation. In Proceedings of the Symposium on Computer Arithmetic (ARITH’17). 98-105. DOI:https://doi.org/10.1109/ARITH.2017.20
[28] J.-M. Muller, N. Brunie, F. Dinechin, C.-P. Jeannerod, M. Joldes, V. Lefèvre, G. Melquiond, N. Revol, and S. Torres. 2018. Handbook of Floating-Point Arithmetic (2nd ed.). Birkhäuser Basel. Retrieved from http://www.springer.com/us/book/9783319765259. · Zbl 1394.65001
[29] J. Riedy and J. Demmel. 2018. Augmented arithmetic operations proposed for IEEE-754 2018. In Proceedings of the Symposium on Computer Arithmetic (ARITH’18). 45-52. DOI:https://doi.org/10.1109/ARITH.2018.8464813
[30] S. M. Rump. 2009. Ultimately fast accurate summation. SIAM J. Sci. Comput. 31, 5 (Jan. 2009), 3466-3502. DOI:https://doi.org/10.1137/080738490 · Zbl 1202.65033
[31] S. M. Rump, T. Ogita, and S. Oishi. 2010. Fast high precision summation. Nonlin. Theor. Applic. IEICE 1 (2010), 2-24. DOI:https://doi.org/10.1587/nolta.1.2
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.