research-article

Public Access

Varbench: an Experimental Framework to Measure and Characterize Performance Variability

Authors:

Brian Kocoloski,

John LangeAuthors Info & Claims

ICPP '18: Proceedings of the 47th International Conference on Parallel Processing

Article No.: 18, Pages 1 - 10

https://doi.org/10.1145/3225058.3225125

Published: 13 August 2018 Publication History

Abstract

Performance variability is a major problem for extreme scale parallel computing applications that rely on bulk synchronization and collective communication. While this problem is most prominent in the context of exascale systems, it is increasingly impacting other communities such as machine learning and graph analytics. In this paper, we present an experimental performance analysis framework called varbench that is designed to precisely measure the prevalence of performance variability in a system, as well as to support workload characterization with respect to how and when a workload generates variability. We demonstrate several of varbench's capabilities as they pertain to exascale-class systems, including its utility for discovering architectural trends, for performing cross-architectural comparisons, and for understanding key statistical properties of performance distributions that have implications for how system software should be designed to mitigate variability.

References

[1]

2018. Top500: The List. https://www.top500.org. (2018). Online, Accessed: 2018-01-24.

[2]

Martin Abadi et al. 2016. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems. In Proc. of 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI '16).

Digital Library

[3]

Michael Adams, Phillip Colella, Dan Graves, Hans Johansen, N.D Keen, Terry Ligocki, Dan Martin, Peter McCorquodale, D. Modiano, and Peter Schwartz. 2013. Chombo Software Package for AMR Applications - Design Document.

[4]

Abhinav Bhatele, Kathryn Mohror, Steven Langer, and Katherine Isaacs. 2013. There Goes the Neighborhood: Performance Degradation due to Nearby Jobs. In Proc. of the 25th Annual IEEE/ACM International Conference for High Performance Computing, Networking, Storage and Analysis (SC '13).

Digital Library

[5]

Sudheer Chunduri, Kevin Harms, Scott Parker, Vitali Morozov, Samuel Oshin, Naveen Cherukuri, and Kalyan Kumaran. 2017. Run-to-run Variability on Xeon Phi based Cray XC Systems. In Proc. of the 29th Annual IEEE/ACM International Conference for High Performance Computing, Networking, Storage and Analysis (SC '17).

Digital Library

[6]

Rafael da Silva, Rosa Filgueira, Ilia Pietri, Ming Jiang, Rizos Sakellariou, and Ewa Deelman. 2017. A Characterization of Workflow Management Systems for Extreme-scale Applications. Future Generation Computer Systems 75 (2017), 228--238.

[7]

Donald Darling. 1957. The Kolmogorov-Smirnov, Cramer-von Mises Tests. The Annals of Mathematical Statistics 28, 4 (1957), 823--838.

[8]

Howard David, Eugene Gorbatov, Ulf Hanebutte, Rahul Knanna, and Christian Le. 2010. RAPL: Memory Power Estimation and Capping. In Proc. of the ACM/IEEE International Symposium on Low-Power Electronics and Design (ISLPED '10).

Digital Library

[9]

Saurabh Dighe, Sriram Vangal, Paolo Aseron, Shasi Kumar, Tiju Jacob, Keith Bowman, Jason Howard, James Tschanz, Vasantha Erraguntla, Nitin Borkar, Vivek De, and Shekhar Borkar. 2011. Within-Die Variation-Aware Dynamic-Voltage-Frequency-Scaling With Optimal Core Allocation and Thread Hopping for the 80-Core TeraFLOPS Processor. IEEE Journal of Solid-State Circuits 46, 1 (2011), 184--193.

[10]

Kristof Du Bois, Stijn Eyerman, Jennifer Sartor, and Lieven Eeckhout. 2013. Criticality Stacks: Identifying Critical Threads in Parallel Programs using Synchronization Behavior. In Proc. of the 40th International Symposium on Computer Architecture (ISCA '13).

Digital Library

[11]

Mark Giampapa, Thomas Gooding, Todd Inglett, and Robert Wisniewski. 2010. Experiences with a Lightweight Supercomputer Kernel: Lessons Learned from Blue Gene's CNK. In Proc. of the 23rd ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC '10).

Digital Library

[12]

Joe Jeddeloh and Brent Keeth. 2012. Hybrid Memory Cube: New DRAM Architecture Increases Density and Performance. In Proc. of the 2012 Symposium on VLSI Technology (VLSIT '12).

[13]

Hartmut Kaiser, Maciej Brodowicz, and Thomas Sterling. 2009. ParalleX: An Advanced Parallel Execution Model for Scaling-Impaired Applications. In Proc. of the International Conference on Parallel Processing Workshops (ICPPW '09).

Digital Library

[14]

Laxmikant Kale and Gengbin Zheng. 2009. Advanced Computational Infrastructures for Parallel and Distributed Applications. Wiley, Charm++ and AMPI: Adaptive Runtime Strategies via Migratable Objects.

Digital Library

[15]

Brian Kocoloski, Leonardo Piga, Wei Huang, Indrani Paul, and John Lange. 2016. A Case for Criticality Models in Exascale Systems. In Proc. of the 18th IEEE International Conference on Cluster Computing (CLUSTER '16).

[16]

Wim Lavrijsen, Costin Iancu, Wibe de Jong, Xin Chen, and Karsten Schwan. 2016. Exploiting Variability for Energy Optimization in Parallel Programs. In Proc. of the Eleventh European Conference on Computer Systems (EuroSys '16).

Digital Library

[17]

Edgar Leon, Ian Karlin, and Adam Moody. 2016. System Noise Revisited: Enabling Application Scalability and Reproducibility with SMT. In Proc. of the 30th IEEE International Parallel and Distributed Processing Symposium (IPDPS '16).

[18]

Jacob Leverich and Christos Kozyrakis. 2014. Reconciling High Server Utilization and Sub-millisecond Quality-of-Service. In Proc. of the 9th European Conference on Computer System (EuroSys '14).

Digital Library

[19]

Chee Liew, Malcolm Atkinson, Michelle Galea, Tan Ang, Paul Martin, and Jano Van Hemert. 2017. Scientific Workflows: Moving Across Paradigms. Comput. Surveys 49, 4 (2017).

Digital Library

[20]

Jiaqi Liu and Gagan Agrawal. 2017. Supporting Fault-Tolerance in Presence of In-Situ Analytics. In Proc. of the 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid '17).

Digital Library

[21]

Jay Lofstead, Fang Zheng, Qing Liu, Scott Klasky, Ron Oldfield, Todd Kordenbrock, Karsten Schwan, and Matthew Wolf. 2010. Managing Variability in the IO Performance of Petascale Storage Systems. In Proc. of the 22nd Annual IEEE/ACM International Conference for High Performance Computing, Networking, Storage and Analysis (SC '10).

Digital Library

[22]

Piotr Luszczek, Jack Dongarra, David Koester, Rolf Rabenseifner, Bob Lucas, Jeremy Kepner, John McCalpin, David Bailey, and Daisuke Takahashi. 2005. Introduction to the HPCChallenge Benchmark Suite. Technical Report. University of Tennessee.

[23]

Grzegorz Malewicz, Matthew Austern, Aart Bik, James Denhert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski. 2010. Pregel: a System for Large-Scale Graph Processing. In Proc. of the 2010 ACM SIGMOD International Conference on Management of Data (SIGMOD '10).

Digital Library

[24]

Maxime Martinasso and Jean-Francois Mehaut. 2--11. A Contention-Aware Performance Model for HPC-Based Networks: A Case Study of the Infiniband Network. Lecture Notes in Computer Science 6852 (2-11), 91--102.

Digital Library

[25]

Hiroyuki Miyazaki, Yoshihiro Kusano, Naoki Shinjou, Fumiyoshi Shoji, Mitsuo Yokokawa, and Tadashi Watanabe. 2012. Overview of the K computer System. Scitech 48, 3 (2012), 255--265.

[26]

Jiannan Ouyang, Brian Kocoloski, John Lange, and Kevin Pedretti. 2015. Achieving Performance Isolation with Lightweight Co-kernels. In Proc. of the 24th International Symposium on High-Performance Parallel and Distributed Computing (HPDC '15).

Digital Library

[27]

Tapasya Patki, David Lowenthal, Anjana Sasidharan, Matthias Maiterth, Barry L. Rountree, Martin Schulz, and Bronis de Supinski. 2015. Practical Resource Management in Power-Constrained, High Performance Computing. In Proc. of the 24th International ACM Symposium on High-Performance Parallel and Distributed Computing (HPDC '15).

Digital Library

[28]

Bogdan Prisacari, German Rodriguez, Philip Hiedelberger, Dong Chen, Cyriel Minkenberg, and Torsten Hoefler. 2014. Efficient Task Placement and Routing of Nearest Neighbor Exchanges in Dragonfly Networks. In Proc. of 23rd ACM International Symposium on High Performance Parallel and Distributed Computing (HPDC '14).

Digital Library

[29]

Nikola Rajovic, Paul Carpenter, Isaac Gelado, Nikola Puzovic, Alex Ramirez, and Mateo Valero. 2013. Supercomputing with Commodity CPUs: Are Mobile SoCs Ready for HPC?. In Proc. of the 26th ACM/IEEE International Conference on High Performance Computing, Networking, Storage and Analysis (SC '13).

Digital Library

[30]

Barry Rountree, Dong Ahn, Bronis de Supinski, David Lowenthal, and Martin Schulz. 2012. Beyond DVFS: A First Look at Performance Under a Hardware-Enforced Power Bound. In Proc. of the IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW '12).

Digital Library

[31]

Barry Rountree, David Lowenthal, Bronis de Supinski, Martin Schulz, Vincent Freeh, and T. Bletsch. 2009. Adagio: Making DVS Practical for Complex HPC Applications. In Proc. of the 23rd ACM International Conference on Supercomputing (ICS '09).

Digital Library

[32]

Avinash Sodani. 2015. Knight's Landing KNL: 2nd Generation Intel Xeon Phi Processor. In Proc. of the IEEE Symposium on High Performance Chips (HC27).

[33]

Akshay Venkatesh, Abhinav Vishnu, Khaled Hamidouche, Nathan Tallent, Dhabaleswar Panda, Darren Kerbyson, and Adolfy Hoisie. 2015. A Case for Application-oblivious Energy-efficient MPI Runtime. In Proc. of the 27th Annual IEEE/ACM International Conference for High Performance Computing, Networking, Storage and Analysis (SC '15).

Digital Library

[34]

Hannes Weisbach, Balazs Gerofi, Brian Kocoloski, Hermann Härtig, and Yutaka Ishikawa. 2018. Hardware Performance Variation: A Comparative Study using Lightweight Kernels. In Proc. of the International Conference, ISC High Performance (ISC HPC '18).

[35]

Peter Westfall. 2014. Kurtosis as Peakedness, 1905--2014. R.I.P. The American Statistician 68, 3 (2014), 191--195.

Cited By

Cui MPapadopoulou NPericàs M(2023)Analysis and Characterization of Performance Variability for OpenMP RuntimeProceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis10.1145/3624062.3624239(1614-1622)Online publication date: 12-Nov-2023
https://dl.acm.org/doi/10.1145/3624062.3624239
Abdelhafez HHalawa HAlmoallim AAhmadi APattabiraman KRipeanu M(2022)Characterizing Variability in Heterogeneous Edge Systems: A Methodology & Case Study2022 IEEE/ACM 7th Symposium on Edge Computing (SEC)10.1109/SEC54971.2022.00016(107-121)Online publication date: Dec-2022
https://doi.org/10.1109/SEC54971.2022.00016
Abdelhafez HHalawa HPattabiraman KRipeanu MDing AMortier R(2021)Snowflakes at the EdgeProceedings of the 4th International Workshop on Edge Systems, Analytics and Networking10.1145/3434770.3459729(1-6)Online publication date: 26-Apr-2021
https://dl.acm.org/doi/10.1145/3434770.3459729
Show More Cited By

Index Terms

Varbench: an Experimental Framework to Measure and Characterize Performance Variability
1. Computer systems organization
  1. Architectures
    1. Parallel architectures
      1. Multicore architectures
2. General and reference
  1. Cross-computing tools and techniques
    1. Measurement
    2. Performance

Recommendations

Delta-oriented multi software product lines
SPLC '14: Proceedings of the 18th International Software Product Line Conference - Volume 1

Modern software systems outgrow the scope of traditional software product lines (SPLs) resulting in multi software product lines (MSPLs) with many interconnected subsystem versions and variants. Delta-oriented programming (DOP) is a flexible, modular ...
The Organization and Management of Grid Infrastructures

Grid computing technology has become fundamental to e-Science. As the virtual organizations established by scientific communities progress from testing their applications to more routine usage, maintaining reliable and adaptive grid infrastructures ...
Performance metrics and ontologies for Grid workflows

Many Grid workflow middleware services require knowledge about the performance behavior of Grid applications/services in order to effectively select, compose, and execute workflows in dynamic and complex Grid systems. To provide performance information ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICPP '18: Proceedings of the 47th International Conference on Parallel Processing

August 2018

945 pages

ISBN:9781450365109

DOI:10.1145/3225058

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

University of Oregon: University of Oregon

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 August 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

NSF

Conference

ICPP 2018

ICPP 2018: 47th International Conference on Parallel Processing

August 13 - 16, 2018

OR, Eugene, USA

Acceptance Rates

ICPP '18 Paper Acceptance Rate 91 of 313 submissions, 29%;

Overall Acceptance Rate 91 of 313 submissions, 29%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
314
Total Downloads

Downloads (Last 12 months)67
Downloads (Last 6 weeks)10

Reflects downloads up to 19 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Cui MPapadopoulou NPericàs M(2023)Analysis and Characterization of Performance Variability for OpenMP RuntimeProceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis10.1145/3624062.3624239(1614-1622)Online publication date: 12-Nov-2023
https://dl.acm.org/doi/10.1145/3624062.3624239
Abdelhafez HHalawa HAlmoallim AAhmadi APattabiraman KRipeanu M(2022)Characterizing Variability in Heterogeneous Edge Systems: A Methodology & Case Study2022 IEEE/ACM 7th Symposium on Edge Computing (SEC)10.1109/SEC54971.2022.00016(107-121)Online publication date: Dec-2022
https://doi.org/10.1109/SEC54971.2022.00016
Abdelhafez HHalawa HPattabiraman KRipeanu MDing AMortier R(2021)Snowflakes at the EdgeProceedings of the 4th International Workshop on Edge Systems, Analytics and Networking10.1145/3434770.3459729(1-6)Online publication date: 26-Apr-2021
https://dl.acm.org/doi/10.1145/3434770.3459729
Ates EZhang YAksar BBrandt JLeung VEgele MCoskun A(2019)HPASProceedings of the 48th International Conference on Parallel Processing10.1145/3337821.3337907(1-10)Online publication date: 5-Aug-2019
https://dl.acm.org/doi/10.1145/3337821.3337907
Zahka DKocoloski BKeahey K(2019)Reducing Kernel Surface Areas for Isolation and ScalabilityProceedings of the 48th International Conference on Parallel Processing10.1145/3337821.3337900(1-10)Online publication date: 5-Aug-2019
https://dl.acm.org/doi/10.1145/3337821.3337900

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents