research-article

Automatic speculative parallelization of loops using polyhedral dependence analysis

Authors:

Arnamoy Bhattacharyya,

José Nelson AmaralAuthors Info & Claims

COSMIC '13: Proceedings of the First International Workshop on Code OptimiSation for MultI and many Cores

Article No.: 1, Pages 1 - 9

https://doi.org/10.1145/2446920.2446921

Published: 24 February 2013 Publication History

Abstract

Speculative Execution (SE) runs loops in parallel even in the presence of a dependence. Using polyhedral dependence analysis, more speculation candidate loops can be discovered than normal OpenMP parallelization. In this research, a framework is implemented that can automatically perform speculative parallelization of loops using Polly's [15] polyhedral dependence analysis. The framework uses two different heuristics to find speculation candidates. The first heuristic allows loops with only may dependences to run speculatively in parallel while the second heuristic filters out cold loops and, using profile information, loops with actual run time dependences. The framework is fully automatic. Running SPEC2006 and the PolyBench/C benchmarks on the IBM BlueGene/Q [16] machine shows that the framework is able to discover more parallelization candidates than OpenMP parallelization and achieve better speedup.

References

[1]

Spec2006 benchmark suite. http://www.spec.org/cpu2006/.

[2]

M. M. Baskaran, U. Bondhugula, S. Krishnamoorthy, J. Ramanujam, A. Rountev, and P. Sadayappan. A compiler framework for optimization of affine loop nests for GPGPUs. In International Conference on Supercomputing (ICS), pages 225--234, 2008.

Digital Library

[3]

P. Berube, A. Preuss, and J. N. Amaral. Combined Profiling: practical collection of feedback information for code optimization. In Proceedings of the second joint WOSP/SIPEW International Conference on Performance engineering, ICPE '11, pages 493--498, Karlshure, Germany, 2011.

Digital Library

[4]

U. Bondhugula, A. Hartono, and J. Ramanujam. A practical automatic polyhedral parallelizer and locality optimizer. In Programming Language Design and Implementation (PLDI), pages 101--113, Tucson, AZ, USA, 2008.

Digital Library

[5]

P. Boulet, A. Darte, G.-A. Silber, and F. Vivien. Loop parallelization algorithms: from parallelism extraction to code generation. In Compiler Optimizations for Scalable Parallel Systems Languages, pages 141--172, 2001.

Digital Library

[6]

M. Cintra, J. F. Martinez, and J. Torrellas. Architectural support for scalable speculative parallelization in shared-memory multiprocessors. In International Symposium on Computer Architecture (ISCA), pages 13--24, Jun 2000.

Digital Library

[7]

A. Darte and F. Vivien. Optimal fine and medium grain parallelism detection in polyhedral reduced dependence graphs. International Journal of Parallel Programming (IJPP), 25(6): 447--496, 1997.

Digital Library

[8]

P. Feautrier. Array expansion. In International Conference on Supercomputing (ICS), July 1988.

Digital Library

[9]

P. Feautrier. Parametric integer programming. In RAIRO Recherche Operationnelle, pages 243--268, 1988.

[10]

P. Feautrier. Some efficient solutions to the affine scheduling problem. part 1. one-dimensional time. International Journal of Parallel Programming (IJPP), 21(5): 313--348, 1992.

Digital Library

[11]

P. Feautrier. Some efficient solutions to the affine scheduling problem. part 2. multidimensional time. International Journal of Parallel Programming (IJPP), 21(6): 389--420, 1992.

Digital Library

[12]

S. Girbal, N. Vasilache, C. Bastoul, A. Cohen, D. Parello, M. Sigler, and O. Temam. Semi-automatic composition of loop transformations for deep parallelism and memory hierarchies. International Journal of Parallel Programming (IJPP), 34(3): 261--317, July 1988.

Digital Library

[13]

S. Gopal, T. Vijaykumar, J. Smith, and G. Sohi. Speculative versioning cache. In International Symposium on High-Performance Computer Architecture, pages 195--205, Feb 1998.

Digital Library

[14]

M. Griebl. Automatic parallelization of loop programs for distributed memory architectures. Habilitation thesis, University of Passau, 2004.

[15]

T. Grosser, H. Zheng, R. A, A. Simbürger, A. Grosslinger, and L.-N. Pouchet. Polly - polyhedral optimization in llvm. In First International Workshop on Polyhedral Compilation Techniques (IMPACT), Chamonix, France, Apr 2011.

[16]

R. Haring, M. Ohmacht, T. Fox, M. Gschwind, D. Satterfield, K. Sugavanam, P. Coteus, P. Heidelberger, M. Blumrich, R. Wisniewski, A. Gara, G.-T. Chiu, P. Boyle, N. Chist, and C. Kim. The IBM Blue Gene/Q compute chip. Micro, IEEE, 32(2): 48--60, March-April 2012.

Digital Library

[17]

A. Jimborean, P. Clauss, B. Pradelle, L. Mastrangelo, and V. Loechner. Adapting the polyhedral model as a framework for efficient speculative parallelization. In Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming, PPoPP '12, pages 295--296, New Orleans, Louisiana, USA, 2012.

Digital Library

[18]

H. Kim, N. P. Johnson, J. W. Lee, S. A. Mahlke, and D. I. August. Automatic speculative doall for clusters. In Code Generation and Optimization (CGO), pages 94--103, 2012.

Digital Library

[19]

X. Kong, D. Klappholz, and K. Psarris. The I Test: A New Test for Subscript Data Dependence. In International Conference on Parallel Processing (ICPP), pages 204--211, 1990.

[20]

C. Lattner. The llvm compiler infrastructure. http://llvm.org.

[21]

J. F. Martínez and J. Torrellas. Speculative synchronization: applying thread-level speculation to explicitly parallel applications. In International Conference on Architectural support for programming languages and operating systems (ASPLOS), pages 18--29, San Jose, California, 2002.

Digital Library

[22]

M. Wolfe. High performance compilers for parallel computing. Addison Wesley, 1996.

Digital Library

[23]

S. Pop, A. Cohen, C. Bastoul, S. Girbal, G. Andre Silber, and N. Vasilache. Graphite: Polyhedral analyses and optimizations for GCC. In Proceedings of the 2006 GCC Developers Summit, 2006.

[24]

L.-N. Pouchet. Polybench/c benchmarks. http://www.cse.ohio-state.edu/pouchet/software/polybench/.

[25]

W. Pugh. The Omega Test: a fast and practical integer programming algorithm for dependence analysis. Communications of the ACM, 8: 4--13, 1992.

[26]

L. Rauchwerger and D. Padua. The LRPD test: Speculative run-time parallelization of loops with privatization and reduction parallelization. In Programming Language Design and Implementation (PLDI), pages 218--232, La Jolla, California, USA, 1995.

Digital Library

[27]

J. G. Steffan, C. B. Colohan, A. Zhai, and T. C. Mowry. A scalable approach to thread-level speculation. In International Symposium on Computer Architecture (ISCA), pages 1--12, Vancouver, British Columbia, Canada, 2000.

Digital Library

[28]

M. walid Benabderrahmane, L. Noel Pouchet, A. Cohen, and C. Bastoul. The Polyhedral Model is more widely applicable than you think. In Proceedings of the 19th joint European conference on Theory and Practice of Software, international conference on Compiler Construction, CC'10/ETAPS'10, pages 283--303, Berlin, Heidelberg, 2010.

Digital Library

[29]

A. Wang, M. Gaudet, P. Wu, J. N. Amaral, M. Ohmacht, C. Barton, R. Silvera, and M. Michael. Evaluation of Blue Gene/Q hardware support for transactional memories. In International conference on Parallel architectures and compilation techniques (PACT), pages 127--136, Minneapolis, Minnesota, USA.

Digital Library

Cited By

Kumar SSingh SAggarwal NGupta BAlhalabi WBand S(2022)An efficient hardware supported and parallelization architecture for intelligent systems to overcome speculative overheadsInternational Journal of Intelligent Systems10.1002/int.2306237:12(11764-11790)Online publication date: 8-Sep-2022
https://doi.org/10.1002/int.23062
Omar REl-Mahdy ARohou E(2020)IR-Level Dynamic Data Dependence Using Abstract Interpretation Towards Speculative ParallelizationIEEE Access10.1109/ACCESS.2020.29977158(99910-99921)Online publication date: 2020
https://doi.org/10.1109/ACCESS.2020.2997715
Prema SNasre RJehadeesan RPanigrahi B(2019)A study on popular auto‐parallelization frameworksConcurrency and Computation: Practice and Experience10.1002/cpe.516831:17Online publication date: 11-Feb-2019
https://doi.org/10.1002/cpe.5168
Show More Cited By

Index Terms

Automatic speculative parallelization of loops using polyhedral dependence analysis
1. Computing methodologies
  1. Parallel computing methodologies
    1. Parallel programming languages
2. Software and its engineering
  1. Software notations and tools
    1. General programming languages
      1. Language types
        Parallel programming languages

Recommendations

A polyhedral compilation framework for loops with dynamic data-dependent bounds
CC '18: Proceedings of the 27th International Conference on Compiler Construction

We study the parallelizing compilation and loop nest optimization of an important class of programs where counted loops have a dynamic data-dependent upper bound. Such loops are amenable to a wider set of transformations than general while loops with ...
A practical automatic polyhedral parallelizer and locality optimizer
PLDI '08: Proceedings of the 29th ACM SIGPLAN Conference on Programming Language Design and Implementation

We present the design and implementation of an automatic polyhedral source-to-source transformation framework that can optimize regular programs (sequences of possibly imperfectly nested loops) for parallelism and locality simultaneously. Through this ...
Speculative Parallelization of Partially Parallel Loops

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

COSMIC '13: Proceedings of the First International Workshop on Code OptimiSation for MultI and many Cores

February 2013

34 pages

ISBN:9781450319713

DOI:10.1145/2446920

Program Chairs:
Zheng Wang
The University of Edinburgh
,
Hugh Leather
The University of Edinburgh

Copyright © 2013 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

AMD

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 February 2013

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

COSMIC '13

Sponsor:

COSMIC '13: International Workshop on Code Optimisation for Multi and Many Cores

February 24, 2013

Shenzhen, China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
204
Total Downloads

Downloads (Last 12 months)12
Downloads (Last 6 weeks)8

Reflects downloads up to 21 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Kumar SSingh SAggarwal NGupta BAlhalabi WBand S(2022)An efficient hardware supported and parallelization architecture for intelligent systems to overcome speculative overheadsInternational Journal of Intelligent Systems10.1002/int.2306237:12(11764-11790)Online publication date: 8-Sep-2022
https://doi.org/10.1002/int.23062
Omar REl-Mahdy ARohou E(2020)IR-Level Dynamic Data Dependence Using Abstract Interpretation Towards Speculative ParallelizationIEEE Access10.1109/ACCESS.2020.29977158(99910-99921)Online publication date: 2020
https://doi.org/10.1109/ACCESS.2020.2997715
Prema SNasre RJehadeesan RPanigrahi B(2019)A study on popular auto‐parallelization frameworksConcurrency and Computation: Practice and Experience10.1002/cpe.516831:17Online publication date: 11-Feb-2019
https://doi.org/10.1002/cpe.5168
Liu BHe JGeng YHuang LLi S(2017)Toward Emotion-Aware Computing: A Loop Selection Approach Based on Machine Learning for Speculative MultithreadingIEEE Access10.1109/ACCESS.2017.26841295(3675-3686)Online publication date: 2017
https://doi.org/10.1109/ACCESS.2017.2684129
Sotomayor RSanchez LGarcia Blas JFernandez JGarcia J(2017)Automatic CPU/GPU Generation of Multi-versioned OpenCL Kernels for C++ Scientific ApplicationsInternational Journal of Parallel Programming10.1007/s10766-016-0425-645:2(262-282)Online publication date: 1-Apr-2017
https://dl.acm.org/doi/10.1007/s10766-016-0425-6
Estebanez ALlanos DGonzalez-Escribano A(2016)A Survey on Thread-Level Speculation TechniquesACM Computing Surveys10.1145/293836949:2(1-39)Online publication date: 30-Jun-2016
https://dl.acm.org/doi/10.1145/2938369
Bhattacharyya AAmaral JFinkel H(2015)Data-dependence profiling to enable safe thread level speculationProceedings of the 25th Annual International Conference on Computer Science and Software Engineering10.5555/2886444.2886459(91-100)Online publication date: 2-Nov-2015
https://dl.acm.org/doi/10.5555/2886444.2886459

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents