skip to main content
research-article
Public Access

Varbench: an Experimental Framework to Measure and Characterize Performance Variability

Published: 13 August 2018 Publication History

Abstract

Performance variability is a major problem for extreme scale parallel computing applications that rely on bulk synchronization and collective communication. While this problem is most prominent in the context of exascale systems, it is increasingly impacting other communities such as machine learning and graph analytics. In this paper, we present an experimental performance analysis framework called varbench that is designed to precisely measure the prevalence of performance variability in a system, as well as to support workload characterization with respect to how and when a workload generates variability. We demonstrate several of varbench's capabilities as they pertain to exascale-class systems, including its utility for discovering architectural trends, for performing cross-architectural comparisons, and for understanding key statistical properties of performance distributions that have implications for how system software should be designed to mitigate variability.

References

[1]
2018. Top500: The List. https://www.top500.org. (2018). Online, Accessed: 2018-01-24.
[2]
Martin Abadi et al. 2016. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems. In Proc. of 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI '16).
[3]
Michael Adams, Phillip Colella, Dan Graves, Hans Johansen, N.D Keen, Terry Ligocki, Dan Martin, Peter McCorquodale, D. Modiano, and Peter Schwartz. 2013. Chombo Software Package for AMR Applications - Design Document.
[4]
Abhinav Bhatele, Kathryn Mohror, Steven Langer, and Katherine Isaacs. 2013. There Goes the Neighborhood: Performance Degradation due to Nearby Jobs. In Proc. of the 25th Annual IEEE/ACM International Conference for High Performance Computing, Networking, Storage and Analysis (SC '13).
[5]
Sudheer Chunduri, Kevin Harms, Scott Parker, Vitali Morozov, Samuel Oshin, Naveen Cherukuri, and Kalyan Kumaran. 2017. Run-to-run Variability on Xeon Phi based Cray XC Systems. In Proc. of the 29th Annual IEEE/ACM International Conference for High Performance Computing, Networking, Storage and Analysis (SC '17).
[6]
Rafael da Silva, Rosa Filgueira, Ilia Pietri, Ming Jiang, Rizos Sakellariou, and Ewa Deelman. 2017. A Characterization of Workflow Management Systems for Extreme-scale Applications. Future Generation Computer Systems 75 (2017), 228--238.
[7]
Donald Darling. 1957. The Kolmogorov-Smirnov, Cramer-von Mises Tests. The Annals of Mathematical Statistics 28, 4 (1957), 823--838.
[8]
Howard David, Eugene Gorbatov, Ulf Hanebutte, Rahul Knanna, and Christian Le. 2010. RAPL: Memory Power Estimation and Capping. In Proc. of the ACM/IEEE International Symposium on Low-Power Electronics and Design (ISLPED '10).
[9]
Saurabh Dighe, Sriram Vangal, Paolo Aseron, Shasi Kumar, Tiju Jacob, Keith Bowman, Jason Howard, James Tschanz, Vasantha Erraguntla, Nitin Borkar, Vivek De, and Shekhar Borkar. 2011. Within-Die Variation-Aware Dynamic-Voltage-Frequency-Scaling With Optimal Core Allocation and Thread Hopping for the 80-Core TeraFLOPS Processor. IEEE Journal of Solid-State Circuits 46, 1 (2011), 184--193.
[10]
Kristof Du Bois, Stijn Eyerman, Jennifer Sartor, and Lieven Eeckhout. 2013. Criticality Stacks: Identifying Critical Threads in Parallel Programs using Synchronization Behavior. In Proc. of the 40th International Symposium on Computer Architecture (ISCA '13).
[11]
Mark Giampapa, Thomas Gooding, Todd Inglett, and Robert Wisniewski. 2010. Experiences with a Lightweight Supercomputer Kernel: Lessons Learned from Blue Gene's CNK. In Proc. of the 23rd ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC '10).
[12]
Joe Jeddeloh and Brent Keeth. 2012. Hybrid Memory Cube: New DRAM Architecture Increases Density and Performance. In Proc. of the 2012 Symposium on VLSI Technology (VLSIT '12).
[13]
Hartmut Kaiser, Maciej Brodowicz, and Thomas Sterling. 2009. ParalleX: An Advanced Parallel Execution Model for Scaling-Impaired Applications. In Proc. of the International Conference on Parallel Processing Workshops (ICPPW '09).
[14]
Laxmikant Kale and Gengbin Zheng. 2009. Advanced Computational Infrastructures for Parallel and Distributed Applications. Wiley, Charm++ and AMPI: Adaptive Runtime Strategies via Migratable Objects.
[15]
Brian Kocoloski, Leonardo Piga, Wei Huang, Indrani Paul, and John Lange. 2016. A Case for Criticality Models in Exascale Systems. In Proc. of the 18th IEEE International Conference on Cluster Computing (CLUSTER '16).
[16]
Wim Lavrijsen, Costin Iancu, Wibe de Jong, Xin Chen, and Karsten Schwan. 2016. Exploiting Variability for Energy Optimization in Parallel Programs. In Proc. of the Eleventh European Conference on Computer Systems (EuroSys '16).
[17]
Edgar Leon, Ian Karlin, and Adam Moody. 2016. System Noise Revisited: Enabling Application Scalability and Reproducibility with SMT. In Proc. of the 30th IEEE International Parallel and Distributed Processing Symposium (IPDPS '16).
[18]
Jacob Leverich and Christos Kozyrakis. 2014. Reconciling High Server Utilization and Sub-millisecond Quality-of-Service. In Proc. of the 9th European Conference on Computer System (EuroSys '14).
[19]
Chee Liew, Malcolm Atkinson, Michelle Galea, Tan Ang, Paul Martin, and Jano Van Hemert. 2017. Scientific Workflows: Moving Across Paradigms. Comput. Surveys 49, 4 (2017).
[20]
Jiaqi Liu and Gagan Agrawal. 2017. Supporting Fault-Tolerance in Presence of In-Situ Analytics. In Proc. of the 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid '17).
[21]
Jay Lofstead, Fang Zheng, Qing Liu, Scott Klasky, Ron Oldfield, Todd Kordenbrock, Karsten Schwan, and Matthew Wolf. 2010. Managing Variability in the IO Performance of Petascale Storage Systems. In Proc. of the 22nd Annual IEEE/ACM International Conference for High Performance Computing, Networking, Storage and Analysis (SC '10).
[22]
Piotr Luszczek, Jack Dongarra, David Koester, Rolf Rabenseifner, Bob Lucas, Jeremy Kepner, John McCalpin, David Bailey, and Daisuke Takahashi. 2005. Introduction to the HPCChallenge Benchmark Suite. Technical Report. University of Tennessee.
[23]
Grzegorz Malewicz, Matthew Austern, Aart Bik, James Denhert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski. 2010. Pregel: a System for Large-Scale Graph Processing. In Proc. of the 2010 ACM SIGMOD International Conference on Management of Data (SIGMOD '10).
[24]
Maxime Martinasso and Jean-Francois Mehaut. 2--11. A Contention-Aware Performance Model for HPC-Based Networks: A Case Study of the Infiniband Network. Lecture Notes in Computer Science 6852 (2-11), 91--102.
[25]
Hiroyuki Miyazaki, Yoshihiro Kusano, Naoki Shinjou, Fumiyoshi Shoji, Mitsuo Yokokawa, and Tadashi Watanabe. 2012. Overview of the K computer System. Scitech 48, 3 (2012), 255--265.
[26]
Jiannan Ouyang, Brian Kocoloski, John Lange, and Kevin Pedretti. 2015. Achieving Performance Isolation with Lightweight Co-kernels. In Proc. of the 24th International Symposium on High-Performance Parallel and Distributed Computing (HPDC '15).
[27]
Tapasya Patki, David Lowenthal, Anjana Sasidharan, Matthias Maiterth, Barry L. Rountree, Martin Schulz, and Bronis de Supinski. 2015. Practical Resource Management in Power-Constrained, High Performance Computing. In Proc. of the 24th International ACM Symposium on High-Performance Parallel and Distributed Computing (HPDC '15).
[28]
Bogdan Prisacari, German Rodriguez, Philip Hiedelberger, Dong Chen, Cyriel Minkenberg, and Torsten Hoefler. 2014. Efficient Task Placement and Routing of Nearest Neighbor Exchanges in Dragonfly Networks. In Proc. of 23rd ACM International Symposium on High Performance Parallel and Distributed Computing (HPDC '14).
[29]
Nikola Rajovic, Paul Carpenter, Isaac Gelado, Nikola Puzovic, Alex Ramirez, and Mateo Valero. 2013. Supercomputing with Commodity CPUs: Are Mobile SoCs Ready for HPC?. In Proc. of the 26th ACM/IEEE International Conference on High Performance Computing, Networking, Storage and Analysis (SC '13).
[30]
Barry Rountree, Dong Ahn, Bronis de Supinski, David Lowenthal, and Martin Schulz. 2012. Beyond DVFS: A First Look at Performance Under a Hardware-Enforced Power Bound. In Proc. of the IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW '12).
[31]
Barry Rountree, David Lowenthal, Bronis de Supinski, Martin Schulz, Vincent Freeh, and T. Bletsch. 2009. Adagio: Making DVS Practical for Complex HPC Applications. In Proc. of the 23rd ACM International Conference on Supercomputing (ICS '09).
[32]
Avinash Sodani. 2015. Knight's Landing KNL: 2nd Generation Intel Xeon Phi Processor. In Proc. of the IEEE Symposium on High Performance Chips (HC27).
[33]
Akshay Venkatesh, Abhinav Vishnu, Khaled Hamidouche, Nathan Tallent, Dhabaleswar Panda, Darren Kerbyson, and Adolfy Hoisie. 2015. A Case for Application-oblivious Energy-efficient MPI Runtime. In Proc. of the 27th Annual IEEE/ACM International Conference for High Performance Computing, Networking, Storage and Analysis (SC '15).
[34]
Hannes Weisbach, Balazs Gerofi, Brian Kocoloski, Hermann Härtig, and Yutaka Ishikawa. 2018. Hardware Performance Variation: A Comparative Study using Lightweight Kernels. In Proc. of the International Conference, ISC High Performance (ISC HPC '18).
[35]
Peter Westfall. 2014. Kurtosis as Peakedness, 1905--2014. R.I.P. The American Statistician 68, 3 (2014), 191--195.

Cited By

View all
  • (2023)Analysis and Characterization of Performance Variability for OpenMP RuntimeProceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis10.1145/3624062.3624239(1614-1622)Online publication date: 12-Nov-2023
  • (2022)Characterizing Variability in Heterogeneous Edge Systems: A Methodology & Case Study2022 IEEE/ACM 7th Symposium on Edge Computing (SEC)10.1109/SEC54971.2022.00016(107-121)Online publication date: Dec-2022
  • (2021)Snowflakes at the EdgeProceedings of the 4th International Workshop on Edge Systems, Analytics and Networking10.1145/3434770.3459729(1-6)Online publication date: 26-Apr-2021
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICPP '18: Proceedings of the 47th International Conference on Parallel Processing
August 2018
945 pages
ISBN:9781450365109
DOI:10.1145/3225058
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

  • University of Oregon: University of Oregon

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 August 2018

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

Conference

ICPP 2018

Acceptance Rates

ICPP '18 Paper Acceptance Rate 91 of 313 submissions, 29%;
Overall Acceptance Rate 91 of 313 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)67
  • Downloads (Last 6 weeks)10
Reflects downloads up to 19 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Analysis and Characterization of Performance Variability for OpenMP RuntimeProceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis10.1145/3624062.3624239(1614-1622)Online publication date: 12-Nov-2023
  • (2022)Characterizing Variability in Heterogeneous Edge Systems: A Methodology & Case Study2022 IEEE/ACM 7th Symposium on Edge Computing (SEC)10.1109/SEC54971.2022.00016(107-121)Online publication date: Dec-2022
  • (2021)Snowflakes at the EdgeProceedings of the 4th International Workshop on Edge Systems, Analytics and Networking10.1145/3434770.3459729(1-6)Online publication date: 26-Apr-2021
  • (2019)HPASProceedings of the 48th International Conference on Parallel Processing10.1145/3337821.3337907(1-10)Online publication date: 5-Aug-2019
  • (2019)Reducing Kernel Surface Areas for Isolation and ScalabilityProceedings of the 48th International Conference on Parallel Processing10.1145/3337821.3337900(1-10)Online publication date: 5-Aug-2019

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media