Abstract
In this paper we continue our investigations started in [8] into the effects of using different synchronization mechanisms in OpenMP-threaded iterative mesh optimization algorithms. We port our test code to the Intel® Xeon® processor (former codename “Haswell”) by employing a user-guided locking API for OpenMP [4] that provides a general and unified user interface and runtime framework. Since the Intel® Transactional Synchronization Extensions (TSX) provide two different options for speculation — Hardware Lock Elision (HLE) and Restricted Transactional Memory (RTM) — we compare a total of four different run modes: (i) HLE, (ii) RTM, (iii) OpenMP critical, and (iv) “unsynchronized”. As we did in [8], we find that either speculative execution option always outperforms the other two modes in terms of their convergence characteristics. Even with their higher overhead, the TSX options are very competitive when it comes to runtime performance measured with the “time-to-convergence” criterion introduced in [8].
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Intel ARK. http://ark.intel.com
Intel\(^{\textregistered }\) Threading Building Blocks. https://www.threadingbuildingblocks.org
LLVM. http://www.llvm.org
Bae, H., Cownie, J., Klemm, M., Terboven, C.: A user-guided locking API for the OpenMP* application program interface. In: DeRose, L., de Supinski, B.R., Olivier, S.L., Chapman, B.M., Müller, M.S. (eds.) IWOMP 2014. LNCS, vol. 8766, pp. 173–186. Springer, Heidelberg (2014)
Baker, A.H., Falgout, R.D., Kolev, T.V., Yang, U.M.: Multigrid smoothers for ultraparallel computing. SIAM J. Sci. Comput. 33, 2864–2887 (2011)
Bihari, B.L.: Applicability of transactional memory to modern codes. In: International Conference on Numerical Analysis and Applied Mathematics 2010 (ICNAAM 2010) Conference Proceedings, pp. 1764–1767. APS, Rodos (2010)
Bihari, B.L.: Transactional memory for unstructured mesh simulations. J. Sci. Comput. 54, 311–332 (2012)
Bihari, B.L., Wong, M., de Supinski, B.R., Diachin, L.: On the algorithmic aspects of using OpenMP synchronization mechanisms: the effects of transactional memory. In: DeRose, L., de Supinski, B.R., Olivier, S.L., Chapman, B.M., Müller, M.S. (eds.) IWOMP 2014. LNCS, vol. 8766, pp. 115–129. Springer, Heidelberg (2014)
Bihari, B.L., Wong, M., Wang, A., de Supinski, B.R., Chen, W.: A case for including transactions in OpenMP II: hardware transactional memory. In: Chapman, B.M., Massaioli, F., Müller, M.S., Rorro, M. (eds.) IWOMP 2012. LNCS, vol. 7312, pp. 44–58. Springer, Heidelberg (2012)
Drepper, U., Molnar, I.: The native POSIX thread library for Linux. Technical report, Redhat (2003)
IBM Compiler Group: IBM XL C/C++ for Blue Gene/Q, V12.1 Compiler Reference (2012)
Haring, R.A., Ohmacht, M., Fox, T.W., Gschwind, M.K., Satterfield, D.L., Sugavanam, K., Coteus, P.W., Heidelberger, P., Blumrich, M.A., Wisniewski, R.W., Gara, A., Chiu, G.L.-T., Boyle, P.A., Christ, N.H., Kim, C.: The IBM blue gene/Q compute chip. IEEE Micro 32(2), 48–60 (2013)
Herlihy, M., Moss, J.E.B.: Transactional memory: architectural support for lock-free data structures. SIGARCH Comput. Archit. News 51(2), 289–300 (1993)
Intel Corporation: Intel\(^{\textregistered }\) Architecture Instruction Set Extensions Programming Reference. Document number 319433–014 (2012)
Intel Corporation: Intel\(^{\textregistered }\) OpenMP* Runtime Library (2015). http://www.openmprtl.org/
Jacobi, C., Slegel, T., Greiner, D.: Transactional memory architecture and implementation for IBM system Z. In: 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp. 25–36, December 2012
Kleen, A.: Lock Elision in the GNU C library. LWN.net 12(1), (2013). http://lwn.net/Articles/534758/
Knupp, P.: Hexahedral and tetrahedral mesh shape optimization. Intl. J. Numer. Meth. Engr. 58, 319–332 (2003)
Le, H.Q., Guthrie, G.L., Williams, D.E., Michael, M.M., Frey, B.G., Starke, W.J., May, C., Odaira, R., Nakaike, T.: Transactional memory support in the IBM power8 processor. IBM J. Res. Dev. 59(1), 8:1–8:14 (2015)
Miller, D.: The GNU C Library version 2.18 is now available. Announcement on the info-gnu mailing list (2013). http://lists.gnu.org/archive/html/info-gnu/2013-08/msg00003.html
OpenMP Architecture Review Board: OpenMP Application Program Interface, Version 4.0 (2013). http://www.openmp.org/
Schindewolf, M., Gyllenhaal, J., Bihari, B.L., Wang, A., Schulz, M., Karl, W.: What scientific applications can benefit from hardware transacional memory? In: International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2012 (2012)
Wang, A., Gaudet, M., Wu, P., Ohmacht, M., Amaral, J.N., Barton, C., Silvera, R., Michael, M.: Evaluation of blue gene/Q hardware support for transactional memories. In: PACT (2012)
Wong, M., Bihari, B.L., de Supinski, B.R., Wu, P., Michael, M., Liu, Y., Chen, W.: A case for including transactions in OpenMP. In: Sato, M., Hanawa, T., Müller, M.S., Chapman, B.M., de Supinski, B.R. (eds.) IWOMP 2010. LNCS, vol. 6132, pp. 149–160. Springer, Heidelberg (2010)
Acknowledgments
The authors thank Trent E. D’Hooge of Livermore Computing for his assistance with our inquiries and in accommodating our runs on the local compute nodes.
Intel and Xeon are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries.
* Other names and brands are the property of their respective owners.
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more information go to http://www.intel.com/performance.
Intel’s compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Bihari, B.L., Bae, H., Cownie, J., Klemm, M., Terboven, C., Diachin, L. (2015). On the Algorithmic Aspects of Using OpenMP Synchronization Mechanisms II: User-Guided Speculative Locks. In: Terboven, C., de Supinski, B., Reble, P., Chapman, B., Müller, M. (eds) OpenMP: Heterogenous Execution and Data Movements. IWOMP 2015. Lecture Notes in Computer Science(), vol 9342. Springer, Cham. https://doi.org/10.1007/978-3-319-24595-9_10
Download citation
DOI: https://doi.org/10.1007/978-3-319-24595-9_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-24594-2
Online ISBN: 978-3-319-24595-9
eBook Packages: Computer ScienceComputer Science (R0)