Document Zbl 1160.65359

Datta, Kaushik; Kamil, Shoaib; Williams, Samuel; Oliker, Leonid; Shalf, John; Yelick, Katherine

Optimization and performance modeling of stencil computations on modern microprocessors. (English) Zbl 1160.65359

SIAM Rev. 51, No. 1, 129-159 (2009).

Summary: Stencil-based kernels constitute the core of many important scientific applications on block-structured grids. Unfortunately, these codes achieve a low fraction of peak performance, due primarily to the disparity between processor and main memory speeds. In this paper, we explore the impact of trends in memory subsystems on a variety of stencil optimization techniques and develop performance models to analytically guide our optimizations. Our work targets cache reuse methodologies across single and multiple stencil sweeps, examining cache-aware algorithms as well as cache-oblivious techniques on the Intel Itanium2, AMD Opteron, and IBM Power5.
Additionally, we consider stencil computations on the heterogeneous multicore design of the Cell processor, a machine with an explicitly managed memory hierarchy. Overall our work represents one of the most extensive analyses of stencil optimizations and performance modeling to date. Results demonstrate that recent trends in memory system organization have reduced the efficacy of traditional cache-blocking optimizations. We also show that a cache-aware implementation is significantly faster than a cache-oblivious approach, while the explicitly managed memory on Cell enables the highest overall efficiency: Cell attains 88% of algorithmic peak while the best competing cache-based processor achieves only 54% of algorithmic peak performance.

Cited in 12 Documents

MSC:

65Y10	Numerical algorithms for specific classes of architectures
65Y20	Complexity and performance of numerical algorithms
65M06	Finite difference methods for initial value and initial-boundary value problems involving PDEs
68M20	Performance evaluation, queueing, and scheduling in the context of computer systems

Keywords:

stencil computations; cache blocking; time skewing; cache-oblivious algorithms; performance modeling; performance evaluation; Intel Itanium2; AMD opteron; IBM power5; STI cell

Cite Review PDF

Full Text: DOI Link