Article

Pipeline spectroscopy

Authors:

Thomas R. Puzak,

Arthur NadasAuthors Info & Claims

ExpCS '07: Proceedings of the 2007 workshop on Experimental computer science

Pages 15 - es

https://doi.org/10.1145/1281700.1281715

Published: 13 June 2007 Publication History

Abstract

Pipeline Spectroscopy is a new technique that allows us to measure the cost of each cache miss. The cost of a miss is displayed (graphed) as a histogram, which represents a precise readout showing a detailed visualization of the cost of each cache miss throughout all levels of the memory hierarchy. We call the graphs 'spectrograms' because they reveal certain signature characteristics of the processor's memory hierarchy, the pipeline, and the miss pattern itself. We show that in a memory hierarchy with N cache levels (L1, L2, ..., L_N, and memory) and a miss cluster of size C, there are (C+N/C) possible miss penalties. This represent all possible sums from all possible combinations of the miss latencies from each level of the memory hierarchy (L2, L3, ... Memory) for a given cluster size. Additionally, a theory is presented that describes the shape of a spectrogram, and we use this theory to predict the shape of spectrograms for larger miss clusters. Detailed analysis of a spectrograph leads to much greater insight in pipeline dynamics, including effects due to prefetching, and miss queueing delays.

References

[1]

A. Glew, "MLP yes! ILP no!," in ASPLOS Wild and Crazy Idea Session, October 1998.

[2]

V. Pai and S. Adve, "Code Transformations to Improve Memory Parallelism," in 32nd International Symposium on Microarchitecture, November 1999.

Digital Library

[3]

H. Zhou and T. Conte, "Enhancing Memory Level Parallelism via Recovery-Free Value Prediction," in International Conference on Supercomputing, June 2003.

Digital Library

[4]

D. Sorin et al, "Analytic Evaluation of Shared-Memory Systems with ILP Processors," in 25th International Symposium on Computer Architecture, 1998.

Digital Library

[5]

V. Pai, P. Ranganathan and S. Adve, "The Impact of Instruction- Level Parallelism on Multiprocessor Performance and Simulation Methodology," in HPCA February 1997.

Digital Library

[6]

P. Ranganathan, K. Gharachorloo, S. Adve and L. Barroso, "Performance of Database Workloads on Shared-Memory Systems with Out-of-Order Processors," in ASPLOS-VIII, 1998.

Digital Library

[7]

Y Chou, B. Fahs, and S Abraham, "Microarchitecture Optimizations for Exploiting Memory-Level Parallelism Exploiting Memory-Level Parallelism" in 31st International Symposium on Computer Architecture, 2004.

Digital Library

[8]

Yuan Chou, Lawrence Spracklen, Santosh G. Abraham. "Store Memory-Level Parallelism Optimizations for Commercial Applications," pp. 183--196, 38th MICRO 2005.

Digital Library

[9]

M. Qureshi, D. Lynch, O. Mutlu, Yale Patt, "A Case for MLP-Aware Cache Replacement" in 33rd ISCA June 2006

Digital Library

[10]

A. Zahir, V. Hummel, M. Kling, T Yeh, US. Patent 6,353,802, "Apparatus and Method for Cycle Accounting in Microprocessors"

[11]

B. Gaither, R. Smith, US Patent 6,892,173 B1, "Analyzing Effectiveness of a Computer Cache By Estimating a Hit Rate Based on Applying a Subset of Real-time Addresses to a Model of the Cache"

[12]

H. Ravichandran, US Patent 6,341,357 B1, "Apparatus and Method for Processor Performance Monitoring",

[13]

R. Trauben, US Patent 5,594,864, "Method and apparatus for unobtrusively monitoring Processor States and Characterizing Bottlenecks in a Pipeline Processor Executing Grouped Instructions"

[14]

G. Brooks, US Patent 5,845,310 "System and Methods For Performing Cache Latency Diagnostics in Scalable Parallel Processing Architectures Including Calculating CPU Idle Time and Counting Number of Cache Misses.

[15]

W. Flynn, US Patent 6,256,775 B1, "Facilities For Detailed Software Performance Analysis in a Multithreaded Processor"

[16]

F. Levine, B. McCredie, W. Starke, E. Welbon, US Patent 5,862,371, "Method and System for Instruction Trace Reconstruction Utilizing Performance monitor outputs and bus Monitoring"

[17]

J. Dean, J. E. Hicks, C. A. Waldspurger, W. E. Weihl, and G. Z. Chrysos. ProfileMe: Hardware support for instruction-level profiling on out-of-order processors. In MICRO'97: pages 292--302, 1997.

Digital Library

[18]

Brian A. Fields, Rastislav Bodik, Mark D. Hill, Chris J. Newburn., Interaction cost and shotgun profiling. ACM Transactions on Architecture and Code Optimization, Vol 1, No. 3. Sept 2004.

Digital Library

[19]

Tejas Karkhanis, James E. Smith, A First-Order Superscalar Processor Model. Proceedings of the 31st ISCA. pages 338--349, June 2004.

Digital Library

[20]

A. Hartstein and T. Puzak. The optimum pipeline depth for a microprocessor, 29th ISCA, pages 7--13 May 2002.

Digital Library

[21]

A. Hartstein and T. Puzak. Optimum power/performance pipeline depth. 36th Annual IEEE/ACM International Symposium on Microarchitecture In MICRO, Dec. 2003.

Digital Library

[22]

P. J. Denning, "The Working Set Model for Program Behavior", CACM 19(5) pp.285--294 (1976).

Digital Library

[23]

R. Bartoszynski, M Niewiadomska-Bugaj, Probability and Statistical Inference, (Wiley series in probability and statistics) 1996

[24]

J. S. Liptay, "Design of the IBM Enterprise System/9000 High-End Processor," IBM Journal of Research and Development, Vol. 36, No. 4, pp. 713--731, July, 1992.

Digital Library

[25]

D. B. Fite, J. E. Murray, D. P. Manley, M. M. McKeon, E. H. Fite, R. M. Salett, and T. Fossum, "Branch Prediction," U.S. Patent #5142634, assigned to Digital Equipment Corporation, Filed Feb. 3, 1989, Issued Aug. 25, 1992.

[26]

N. Suzuki, "Microprocessor Having Branch Prediction Function," U.S. Patent #5327536, assigned to NEC Corporation, Filed May 22, 1991, Issued July 5, 1994.

[27]

C. H. Perleberg, and A. J. Smith, "Branch Target Buffer Design and Optimization," IEEE Transactions on Computers, Vol. 42, Issue 4, pp. 396--412, Apr., 1993.

Digital Library

[28]

B. Calder, and D. Grunwald, "Fast and Accurate Instruction Fetch and Branch Prediction," 21st ISCA pp. 2--11, April 18--21, 1994.

Digital Library

Index Terms

Pipeline spectroscopy
1. Software and its engineering
  1. Software notations and tools
    1. General programming languages
      1. Language features

Recommendations

Pipeline spectroscopy
ecs'07: Experimental computer science on Experimental computer science

Pipeline Spectroscopy is a new technique that allows us to measure the cost of each cache miss. The cost of a miss is displayed (graphed) as a histogram, which represents a precise readout showing a detailed visualization of the cost of each cache miss ...
Increasing hardware data prefetching performance using the second-level cache

Techniques to reduce or tolerate large memory latencies are critical for achieving high processor performance. Hardware data prefetching is one of the most heavily studied solutions, but it is essentially applied to first-level caches where it can ...
NCID: a non-inclusive cache, inclusive directory architecture for flexible and efficient cache hierarchies
CF '10: Proceedings of the 7th ACM international conference on Computing frontiers

Chip-multiprocessor (CMP) architectures employ multi-level cache hierarchies with private L2 caches per core and a shared L3 cache like Intel's Nehalem processor and AMD's Barcelona processor. When designing a multi-level cache hierarchy, one of the key ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ExpCS '07: Proceedings of the 2007 workshop on Experimental computer science

June 2007

218 pages

ISBN:9781595937513

DOI:10.1145/1281700

Copyright © 2007 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 June 2007

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

ExpCS07

Sponsor:

ExpCS07: Workshop on Experimental Computer Science

June 13 - 14, 2007

California, San Diego

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
157
Total Downloads

Downloads (Last 12 months)1
Downloads (Last 6 weeks)1

Reflects downloads up to 23 Oct 2024

Other Metrics

View Author Metrics

Citations

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents