skip to main content
research-article

Learning from mistakes: a comprehensive study on real world concurrency bug characteristics

Published: 01 March 2008 Publication History

Abstract

The reality of multi-core hardware has made concurrent programs pervasive. Unfortunately, writing correct concurrent programs is difficult. Addressing this challenge requires advances in multiple directions, including concurrency bug detection, concurrent program testing, concurrent programming model design, etc. Designing effective techniques in all these directions will significantly benefit from a deep understanding of real world concurrency bug characteristics.
This paper provides the first (to the best of our knowledge) comprehensive real world concurrency bug characteristic study. Specifically, we have carefully examined concurrency bug patterns, manifestation, and fix strategies of 105 randomly selected real world concurrency bugs from 4 representative server and client open-source applications (MySQL, Apache, Mozilla and OpenOffice). Our study reveals several interesting findings and provides useful guidance for concurrency bug detection, testing, and concurrent programming language design.
Some of our findings are as follows: (1) Around one third of the examined non-deadlock concurrency bugs are caused by violation to programmers' order intentions, which may not be easily expressed via synchronization primitives like locks and transactional memories; (2) Around 34% of the examined non-deadlock concurrency bugs involve multiple variables, which are not well addressed by existing bug detection tools; (3) About 92% of the examined concurrency bugs canbe reliably triggered by enforcing certain orders among no more than 4 memory accesses. This indicates that testing concurrent programs can target at exploring possible orders among every small groups of memory accesses, instead of among all memory accesses; (4) About 73% of the examinednon-deadlock concurrency bugs were not fixed by simply adding or changing locks, and many of the fixes were not correct at the first try, indicating the difficulty of reasoning concurrent execution by programmers.

Supplementary Material

JPG File (1346323.jpg)
index.html (index.html)
Slides from the presentation
ZIP File (p329-shanlu-slides.zip)
Supplemental material for Learning from mistakes: a comprehensive study on real world concurrency bug characteristics
Audio only (1346323.mp3)
Video (1346323.mp4)

References

[1]
A.-R. Adl-Tabatabai, C. Kozyrakis, and B. Saha. Transactional programming in a multi-core environment. In PPOPP, 2007.
[2]
C. S. Ananian, K. Asanovic, B. C. Kuszmaul, C. E. Leiserson, and S. Lie. Unbounded transactional memory. In HPCA, 2005.
[3]
C. Boyapati, R. Lee, and M. Rinard. Ownership types for safe programming: Preventing data races and deadlocks. In OOPSLA, 2002.
[4]
A. Bron, E. Farchi, Y. Magid, Y. Nir, and S. Ur. Applications of synchronization coverage. In PPoPP, 2005.
[5]
B. D. Carlstrom, A. McDonald, H. Chafi, J. Chung, C. C. Minh, C. Kozyrakis, and K. Olukotun. The atomos transactional programming language. In PLDI '06, 2006.
[6]
S. Chandra and P. M. Chen. Whither generic recovery from application faults? a fault study using open-source software. In DSN, 2000.
[7]
J.-D. Choi et al. Efficient and precise datarace detection for multithreaded object-oriented programs. In PLDI, 2002.
[8]
A. Chou, J. Yang, B. Chelf, S. Hallem, and D. R. Engler. An empirical study of operating system errors. In SOSP, 2001.
[9]
O. Edelstein, E. Farchi, Y. Nir, G. Ratsaby, and S. Ur. Multi-threaded java program test generation. IBM Systems Journal, 2002.
[10]
D. Engler and K. Ashcraft. RacerX: Effective, static detection of race conditions and deadlocks. In SOSP, 2003.
[11]
E. Farchi, Y. Nir, and S. Ur. Concurrent bug patterns and how to test them. In IPDPS, 2003.
[12]
C. Flanagan and S. N. Freund. Atomizer: a dynamic atomicity checker for multithreaded programs. In POPL, 2004.
[13]
P. Godefroid. Model checking for programming languages using verisoft. In POPL, 1997.
[14]
W. Gu, Z. Kalbarczyk, R. K. Iyer, and Z. Yang. Characterization of linux kernel behavior under errors. In DSN, 2003.
[15]
L. Hammond, V. Wong, M. Chen, B. D. Carlstrom, J. D. Davis, B. Hertzberg, M. K. Prabhu, H. Wijaya, C. Kozyrakis, and K. Olukotun. Transactional memory coherence and consistency. In ISCA, 2004.
[16]
T. Harris and K. Fraser. Language support for lightweight transactions. In OOPSLA, 2003.
[17]
T. Harris, S. Marlow, S. Peyton-Jones, and M. Herlihy. Composable memory transactions. In PPoPP '05, 2005.
[18]
R. Hastings and B. Joyce. Purify: Fast detection of memory leaks and access errors. In Usenix, 1992.
[19]
Z. Li, S. Lu, S. Myagmar, and Y. Zhou. CP-Miner: A tool for finding copy-paste and related bugs in o perating system code. In OSDI, 2004.
[20]
Z. Li, L. Tan, X. Wang, S. Lu, Y. Zhou, and C. Zhai. Have things changed now?: an empirical study of bug characteristics in modern open source software. In Proceedings of the 1st workshop on Architectural and system support for improving software dependability (ASID'06), 2006.
[21]
S. Lu, W. Jiang, and Y. Zhou. A study of interleaving coverage criteria. In FSE, 2007.
[22]
S. Lu, S. Park, C. Hu, X. Ma, W. Jiang, Z. Li, R. A. Popa, and Y. Zhou. Muvi: Automatically inferring multi-variable access correlations and detecting related semantic and concurrency bugs. In SOSP07, 2007.
[23]
S. Lu, J. Tucek, F. Qin, and Y. Zhou. Avio: Detecting atomicity violations via access interleaving invariants. In ASPLOS, 2006.
[24]
B. McCloskey, F. Zhou, D. Gay, and E. Brewer. Autolocker: synchronization inference for atomic sections. In POPL, 2006.
[25]
M. Moir. Transparent support for wait-free transactions. In 11th International Workshop on Distributed Algorithms, 1997.
[26]
K. E. Moore, J. Bobba, M. J. Moravan, M. D. Hill, and D. A. Wood. Logtm: Log-based transactional memory. In HPCA, 2006.
[27]
J. E. B. Moss and A. L. Hosking. Nested transactional memory: model and architecture sketches. Sci. Comput. Program., 2006.
[28]
M. Musuvathi and S. Qadeer. Iterative context bounding for systematic testing of multithreaded programs. In PLDI, 2007.
[29]
G. C. Necula, S. McPeak, and W. Weimer. CCured: Type-safe retrofitting of legacy code. In POPL, 2002.
[30]
N. Nethercote and J. Seward. Valgrind: A program supervision framework. ENTCS, 2003.
[31]
R. H. B. Netzer and B. P. Miller. Improving the accuracy of data race detection. In PPoPP, 1991.
[32]
T. Ostrand, E. Weyuker, and R. Bell. Predicting the location and number of faults in large software systems. TSE, 2005.
[33]
M. Prvulovic and J. Torrellas. ReEnact: Using thread-level speculation mechanisms to debug data races in multithreaded codes. In ISCA, 2003.
[34]
S. Qadeer and D. Wu. Kiss: keep it simple and sequential. In PLDI, 2004.
[35]
F. Qin, J. Tucek, J. Sundaresan, and Y. Zhou. Rx: Treating bugs as allergies c a safe method to survive software failures. In SOSP, 2005.
[36]
H. E. Ramadan, C. J. Rossbach, D. E. Porter, O. S. Hofmann, A. Bhandari, and E. Witchel. Metatm/txlinux: transactional memory for an operating system. In ISCA, 2007.
[37]
S. Savage, M. Burrows, G. Nelson, P. Sobalvarro, and T. Anderson. Eraser: A dynamic data race detector for multithreaded programs. ACM TOCS, 1997.
[38]
M. Sullivan and R. Chillarege. A comparison of software defects in database management systems and operating systems. In FTCS, 1992.
[39]
R. N. Taylor, D. L. Levine, and C. D. Kelly. Structural testing of concurrent programs. IEEE Trans. Softw. Eng., 1992.
[40]
M. Vaziri, F. Tip, and J. Dolby. Associating synchronization constraints with data in an object-oriented language. In POPL, 2006.
[41]
M. Xu, R. Bodík, and M. D. Hill. A serializability violation detector for shared-memory server programs. In PLDI, 2005.
[42]
Y. Yu, T. Rodeheffer, and W. Chen. Racetrack: Efficient detection of data race conditions via adaptive tracking. In SOSP, 2005.
[43]
Z. Li et. al. Have things changed now? - an empirical study of bug characteristics in modern open source software. In ASID workshop in ASPLOS, 2006.

Cited By

View all
  • (2024)FAIL: Analyzing Software Failures from the News Using LLMsProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695022(506-518)Online publication date: 27-Oct-2024
  • (2024)Choral: Object-oriented Choreographic ProgrammingACM Transactions on Programming Languages and Systems10.1145/363239846:1(1-59)Online publication date: 16-Jan-2024
  • (2024)When Is Parallelism Fearless and Zero-Cost with Rust?Proceedings of the 36th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3626183.3659966(27-40)Online publication date: 17-Jun-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ASPLOS XIII: Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
March 2008
352 pages
ISBN:9781595939586
DOI:10.1145/1346281
  • cover image ACM SIGPLAN Notices
    ACM SIGPLAN Notices  Volume 43, Issue 3
    ASPLOS '08
    March 2008
    339 pages
    ISSN:0362-1340
    EISSN:1558-1160
    DOI:10.1145/1353536
    Issue’s Table of Contents
  • cover image ACM SIGARCH Computer Architecture News
    ACM SIGARCH Computer Architecture News  Volume 36, Issue 1
    ASPLOS '08
    March 2008
    339 pages
    ISSN:0163-5964
    DOI:10.1145/1353534
    Issue’s Table of Contents
  • cover image ACM SIGOPS Operating Systems Review
    ACM SIGOPS Operating Systems Review  Volume 42, Issue 2
    ASPLOS '08
    March 2008
    339 pages
    ISSN:0163-5980
    DOI:10.1145/1353535
    Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 March 2008

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. bug characteristics
  2. concurrency bug
  3. concurrent program

Qualifiers

  • Research-article

Conference

ASPLOS08

Acceptance Rates

ASPLOS XIII Paper Acceptance Rate 31 of 127 submissions, 24%;
Overall Acceptance Rate 535 of 2,713 submissions, 20%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)545
  • Downloads (Last 6 weeks)73
Reflects downloads up to 26 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)FAIL: Analyzing Software Failures from the News Using LLMsProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695022(506-518)Online publication date: 27-Oct-2024
  • (2024)Choral: Object-oriented Choreographic ProgrammingACM Transactions on Programming Languages and Systems10.1145/363239846:1(1-59)Online publication date: 16-Jan-2024
  • (2024)When Is Parallelism Fearless and Zero-Cost with Rust?Proceedings of the 36th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3626183.3659966(27-40)Online publication date: 17-Jun-2024
  • (2024)Understanding Transaction Bugs in Database SystemsProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3639207(1-13)Online publication date: 20-May-2024
  • (2024)Reorder Pointer Flow in Sound Concurrency Bug PredictionProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3623300(1-13)Online publication date: 20-May-2024
  • (2024)An Empirical Study of Performance Interference: Timing Violation Patterns and Impacts2024 IEEE 30th Real-Time and Embedded Technology and Applications Symposium (RTAS)10.1109/RTAS61025.2024.00033(320-333)Online publication date: 13-May-2024
  • (2024)ConDU: Method for On-the-fly Detection of Non-deadlock Concurrency Errors in UAV Software2024 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW)10.1109/ICSTW60967.2024.00037(137-143)Online publication date: 27-May-2024
  • (2024)Petri Net Unfolding-Based Detection and Replay of Program DeadlocksIEEE Access10.1109/ACCESS.2024.338448912(53713-53738)Online publication date: 2024
  • (2024)A dynamic broad TSK fuzzy classifier based on iterative learning on progressively rebalanced dataInformation Sciences10.1016/j.ins.2024.120976677(120976)Online publication date: Aug-2024
  • (2024)Evaluation of LLM Tools for Feedback Generation in a Course on Concurrent ProgrammingInternational Journal of Artificial Intelligence in Education10.1007/s40593-024-00406-0Online publication date: 15-May-2024
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media