research-article

Empirical Study of Restarted and Flaky Builds on Travis CI

Authors:

Thomas Durieux,

Claire Le Goues,

Michael Hilton,

Rui AbreuAuthors Info & Claims

MSR '20: Proceedings of the 17th International Conference on Mining Software Repositories

Pages 254 - 264

https://doi.org/10.1145/3379597.3387460

Published: 18 September 2020 Publication History

Abstract

Continuous Integration (CI) is a development practice where developers frequently integrate code into a common codebase. After the code is integrated, the CI server runs a test suite and other tools to produce a set of reports (e.g., the output of linters and tests). If the result of a CI test run is unexpected, developers have the option to manually restart the build, re-running the same test suite on the same code; this can reveal build flakiness, if the restarted build outcome differs from the original build.

In this study, we analyze restarted builds, flaky builds, and their impact on the development workflow. We observe that developers restart at least 1.72% of builds, amounting to 56,522 restarted builds in our Travis CI dataset. We observe that more mature and more complex projects are more likely to include restarted builds. The restarted builds are mostly builds that are initially failing due to a test, network problem, or a Travis CI limitations such as execution timeout. Finally, we observe that restarted builds have an impact on development workflow. Indeed, in 54.42% of the restarted builds, the developers analyze and restart a build within an hour of the initial build execution. This suggests that developers wait for CI results, interrupting their workflow to address the issue. Restarted builds also slow down the merging of pull requests by a factor of three, bringing median merging time from 16h to 48h.

References

[1]

Kent Beck. 1999. Extreme Programming Explained: Embrace Change. Addison-Wesley Longman Publishing Co., Inc., USA.

Digital Library

[2]

Jonathan Bell, Owolabi Legunsen, Michael Hilton, Lamyaa Eloussi, Tifany Yung, and Darko Marinov. 2018. D e F laker: automatically detecting flaky tests. In Proceedings of the 40th International Conference on Software Engineering. ACM, New York, NY, USA, 433--444.

Digital Library

[3]

Moritz Beller, Georgios Gousios, and Andy Zaidman. 2017. Oops, My Tests Broke the Build: An Explorative Analysis of Travis CI with GitHub. In Proceedings of the 14th International Conference on Mining Software Repositories (MSR '17). IEEE Press, Piscataway, NJ, USA, 356--367. https://doi.org/10.1109/MSR.2017.62

Digital Library

[4]

Mary Czerwinski, Eric Horvitz, and Susan Wilhite. 2004. A Diary Study of Task Switching and Interruptions. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '04). Association for Computing Machinery, New York, NY, USA, 175--182. https://doi.org/10.1145/985692.985715

Digital Library

[5]

Thomas Durieux, Rui Abreu, Martin Monperrus, Tegawendé F Bissyandé, and Luís Cruz. 2019. An Analysis of 35+ Million Jobs of Travis CI. In 2019 IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE, Piscataway, NJ, USA, 291--295.

[6]

Moritz Eck, Fabio Palomba, Marco Castelluccio, and Alberto Bacchelli. 2019. Understanding Flaky Tests: The Developer's Perspective. In Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2019). Association for Computing Machinery, New York, NY, USA, 830--840. https://doi.org/10.1145/3338906.3338945

Digital Library

[7]

Michael Hilton, Timothy Tunnell, Kai Huang, Darko Marinov, and Danny Dig. 2016. Usage, costs, and benefits of continuous integration in open-source projects. In Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering. ACM, New York, NY, USA, 426--437.

Digital Library

[8]

He Jiang, Xiaochen Li, Zijiang Yang, and Jifeng Xuan. 2017. What Causes My Test Alarm? Automatic Cause Analysis for Test Alarms in System and Integration Testing. In Proceedings of the 39th International Conference on Software Engineering (ICSE '17). IEEE Press, Piscataway, NJ, USA, 712--723. https://doi.org/10.1109/ICSE.2017.71

Digital Library

[9]

Adriaan Labuschagne, Laura Inozemtseva, and Reid Holmes. 2017. Measuring the Cost of Regression Testing in Practice: A Study of Java Projects Using Continuous Integration. In Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering (ESEC/FSE 2017). Association for Computing Machinery, New York, NY, USA, 821--830. https://doi.org/10.1145/3106237.3106288

Digital Library

[10]

Wing Lam, Reed Oei, August Shi, Darko Marinov, and Tao Xie. 2019. iDFlakies: A Framework for Detecting and Partially Classifying Flaky Tests. In 2019 12th IEEE Conference on Software Testing, Validation and Verification (ICST). IEEE, Piscataway, NJ, USA, 312--322.

[11]

Qingzhou Luo, Farah Hariri, Lamyaa Eloussi, and Darko Marinov. 2014. An empirical analysis of flaky tests. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering. ACM, New York, NY, USA, 643--653.

Digital Library

[12]

Gloria Mark, Daniela Gudith, and Ulrich Klocke. 2008. The Cost of Interrupted Work: More Speed and Stress. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '08). Association for Computing Machinery, New York, NY, USA, 107--110. https://doi.org/10.1145/1357054.1357072

Digital Library

[13]

John Micco. 2017. The State of Continuous Integration Testing@ Google.

[14]

Thomas Rausch, Waldemar Hummer, Philipp Leitner, and Stefan Schulte. 2017. An empirical analysis of build failures in the continuous integration workflows of Java-based open-source software. In Proceedings of the 14th international conference on mining software repositories. IEEE Press, Piscataway, NJ, USA, 345--355.

Digital Library

[15]

David Gray Widder, Michael Hilton, Christian Kästner, and Bogdan Vasilescu. 2018. I'm Leaving You, Travis: A Continuous Integration Breakup Story. In Proceedings of the 15th International Conference on Mining Software Repositories (MSR '18). Association for Computing Machinery, New York, NY, USA, 165--169. https://doi.org/10.1145/3196398.3196422

Digital Library

[16]

Yangyang Zhao, Alexander Serebrenik, Yuming Zhou, Vladimir Filkov, and Bog-dan Vasilescu. 2017. The impact of continuous integration on other software development practices: a large-scale empirical study. In Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering. IEEE Press, Piscataway, NJ, USA, 60--71.

Digital Library

Cited By

Yin MKashiwa YGallaba KAlfadel MKamei YMcIntosh SFilkov VRay BZhou M(2024)Developer-Applied Accelerations in Continuous Integration: A Detection Approach and Catalog of PatternsProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695533(1655-1666)Online publication date: 27-Oct-2024
https://dl.acm.org/doi/10.1145/3691620.3695533
Nourry OKashiwa YShang WShu HKamei Y(2024)My Fuzzers Won’t Build: An Empirical Study of Fuzzing Build FailuresACM Transactions on Software Engineering and Methodology10.1145/3688842Online publication date: 21-Aug-2024
https://doi.org/10.1145/3688842
Berndt ABach TBaltes S(2024)Do Test and Environmental Complexity Increase Flakiness? An Empirical Study of SAP HANAProceedings of the 18th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement10.1145/3674805.3695407(572-581)Online publication date: 24-Oct-2024
https://dl.acm.org/doi/10.1145/3674805.3695407
Show More Cited By

Index Terms

Empirical Study of Restarted and Flaky Builds on Travis CI
1. Social and professional topics
  1. Professional topics
    1. Management of computing and information systems
2. Software and its engineering
  1. Software creation and management
    1. Software verification and validation
      1. Software defect analysis
        Software testing and debugging
  2. Software notations and tools
    1. Software configuration management and version control systems

Index terms have been assigned to the content through auto-classification.

Recommendations

Noise and heterogeneity in historical build data: an empirical study of Travis CI
ASE '18: Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering

Automated builds, which may pass or fail, provide feedback to a development team about changes to the codebase. A passing build indicates that the change compiles cleanly and tests (continue to) pass. A failing (a.k.a., broken) build indicates that ...
An empirical analysis of flaky tests
FSE 2014: Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering

Regression testing is a crucial part of software development. It checks that software changes do not break existing functionality. An important assumption of regression testing is that test outcomes are deterministic: an unmodified test is expected to ...
The impact of failing, flaky, and high failure tests on the number of crash reports associated with Firefox builds
ESEC/FSE 2018: Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering

Testing is an integral part of release engineering and continuous integration. In theory, a failed test on a build indicates a problem that should be fixed and the build should not be released. In practice, tests decay and developers often release ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MSR '20: Proceedings of the 17th International Conference on Mining Software Repositories

June 2020

675 pages

ISBN:9781450375177

DOI:10.1145/3379597

Copyright © 2020 Owner/Author.

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 September 2020

Check for updates

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

National Science Foundation
CMU-Portugal
Fundação para a Ciência e a Tecnologia

Conference

MSR '20

Sponsor:

SIGSOFT

MSR '20: 17th International Conference on Mining Software Repositories

June 29 - 30, 2020

Seoul, Republic of Korea

Upcoming Conference

ICSE 2025

2025 IEEE/ACM 46th International Conference on Software Engineering

April 26 - May 3, 2025

Ottawa , ON , Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

20
Total Citations
View Citations
303
Total Downloads

Downloads (Last 12 months)77
Downloads (Last 6 weeks)10

Reflects downloads up to 25 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Yin MKashiwa YGallaba KAlfadel MKamei YMcIntosh SFilkov VRay BZhou M(2024)Developer-Applied Accelerations in Continuous Integration: A Detection Approach and Catalog of PatternsProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695533(1655-1666)Online publication date: 27-Oct-2024
https://dl.acm.org/doi/10.1145/3691620.3695533
Nourry OKashiwa YShang WShu HKamei Y(2024)My Fuzzers Won’t Build: An Empirical Study of Fuzzing Build FailuresACM Transactions on Software Engineering and Methodology10.1145/3688842Online publication date: 21-Aug-2024
https://doi.org/10.1145/3688842
Berndt ABach TBaltes S(2024)Do Test and Environmental Complexity Increase Flakiness? An Empirical Study of SAP HANAProceedings of the 18th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement10.1145/3674805.3695407(572-581)Online publication date: 24-Oct-2024
https://dl.acm.org/doi/10.1145/3674805.3695407
Weeraddana NAlfadel MMcIntosh S(2024)Dependency-Induced Waste in Continuous Integration: An Empirical Study of Unused Dependencies in the npm EcosystemProceedings of the ACM on Software Engineering10.1145/36608231:FSE(2632-2655)Online publication date: 12-Jul-2024
https://dl.acm.org/doi/10.1145/3660823
Kola-Olawuyi AWeeraddana NNagappan MSpinellis DConstantinou EBacchelli A(2024)The Impact of Code Ownership of DevOps Artefacts on the Outcome of DevOps CI BuildsProceedings of the 21st International Conference on Mining Software Repositories10.1145/3643991.3644924(543-555)Online publication date: 15-Apr-2024
https://dl.acm.org/doi/10.1145/3643991.3644924
Weeraddana NAlfadel MMcIntosh S(2024)Characterizing Timeout Builds in Continuous IntegrationIEEE Transactions on Software Engineering10.1109/TSE.2024.338784050:6(1450-1463)Online publication date: 11-Apr-2024
https://dl.acm.org/doi/10.1109/TSE.2024.3387840
Ladisa PPlate HMartinez MBarais O(2023)SoK: Taxonomy of Attacks on Open-Source Software Supply Chains2023 IEEE Symposium on Security and Privacy (SP)10.1109/SP46215.2023.10179304(1509-1526)Online publication date: May-2023
https://doi.org/10.1109/SP46215.2023.10179304
Maipradit RWang DThongtanunam PKula RKamei YMcIntosh S(2023)Repeated Builds During Code Review: An Empirical Study of the OpenStack Community2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE)10.1109/ASE56229.2023.00030(153-165)Online publication date: 11-Sep-2023
https://doi.org/10.1109/ASE56229.2023.00030
Parry OKapfhammer GHilton MMcMinn P(2023)Empirically evaluating flaky test detection techniques combining test case rerunning and machine learning modelsEmpirical Software Engineering10.1007/s10664-023-10307-w28:3Online publication date: 28-Apr-2023
https://dl.acm.org/doi/10.1007/s10664-023-10307-w
Rzig DHassan FBansal CNagappan N(2022)Characterizing the Usage of CI Tools in ML ProjectsProceedings of the 16th ACM / IEEE International Symposium on Empirical Software Engineering and Measurement10.1145/3544902.3546237(69-79)Online publication date: 19-Sep-2022
https://dl.acm.org/doi/10.1145/3544902.3546237
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents