-
Governing the Commons: Code Ownership and Code-Clones in Large-Scale Software Development
Authors:
Anders Sundelin,
Javier Gonzalez-Huerta,
Richard Torkar,
Krzysztof Wnuk
Abstract:
Context: In software development organizations employing weak or collective ownership, different teams are allowed and expected to autonomously perform changes in various components. This creates diversity both in the knowledge of, and in the responsibility for, individual components.
Objective: Our objective is to understand how and why different teams introduce technical debt in the form of co…
▽ More
Context: In software development organizations employing weak or collective ownership, different teams are allowed and expected to autonomously perform changes in various components. This creates diversity both in the knowledge of, and in the responsibility for, individual components.
Objective: Our objective is to understand how and why different teams introduce technical debt in the form of code clones as they change different components.
Method: We collected data about change size and clone introductions made by ten teams in eight components which was part of a large industrial software system. We then designed a Multi-Level Generalized Linear Model (MLGLM), to illustrate the teams' differing behavior. Finally, we discussed the results with three development teams, plus line manager and the architect team, evaluating whether the model inferences aligned with what they expected. Responses were recorded and thematically coded.
Results: The results show that teams do behave differently in different components, and the feedback from the teams indicates that this method of illustrating team behavior can be useful as a complement to traditional summary statistics of ownership.
Conclusions: We find that our model-based approach produces useful visualizations of team introductions of code clones as they change different components. Practitioners stated that the visualizations gave them insights that were useful, and by comparing with an average team, inter-team comparisons can be avoided. Thus, this has the potential to be a useful feedback tool for teams in software development organizations that employ weak or collective ownership.
△ Less
Submitted 27 September, 2024; v1 submitted 24 May, 2024;
originally announced May 2024.
-
A Second Look at the Impact of Passive Voice Requirements on Domain Modeling: Bayesian Reanalysis of an Experiment
Authors:
Julian Frattini,
Davide Fucci,
Richard Torkar,
Daniel Mendez
Abstract:
The quality of requirements specifications may impact subsequent, dependent software engineering (SE) activities. However, empirical evidence of this impact remains scarce and too often superficial as studies abstract from the phenomena under investigation too much. Two of these abstractions are caused by the lack of frameworks for causal inference and frequentist methods which reduce complex data…
▽ More
The quality of requirements specifications may impact subsequent, dependent software engineering (SE) activities. However, empirical evidence of this impact remains scarce and too often superficial as studies abstract from the phenomena under investigation too much. Two of these abstractions are caused by the lack of frameworks for causal inference and frequentist methods which reduce complex data to binary results. In this study, we aim to demonstrate (1) the use of a causal framework and (2) contrast frequentist methods with more sophisticated Bayesian statistics for causal inference. To this end, we reanalyze the only known controlled experiment investigating the impact of passive voice on the subsequent activity of domain modeling. We follow a framework for statistical causal inference and employ Bayesian data analysis methods to re-investigate the hypotheses of the original study. Our results reveal that the effects observed by the original authors turned out to be much less significant than previously assumed. This study supports the recent call to action in SE research to adopt Bayesian data analysis, including causal frameworks and Bayesian statistics, for more sophisticated causal inference.
△ Less
Submitted 16 February, 2024;
originally announced February 2024.
-
Applying Bayesian Data Analysis for Causal Inference about Requirements Quality: A Controlled Experiment
Authors:
Julian Frattini,
Davide Fucci,
Richard Torkar,
Lloyd Montgomery,
Michael Unterkalmsteiner,
Jannik Fischbach,
Daniel Mendez
Abstract:
It is commonly accepted that the quality of requirements specifications impacts subsequent software engineering activities. However, we still lack empirical evidence to support organizations in deciding whether their requirements are good enough or impede subsequent activities. We aim to contribute empirical evidence to the effect that requirements quality defects have on a software engineering ac…
▽ More
It is commonly accepted that the quality of requirements specifications impacts subsequent software engineering activities. However, we still lack empirical evidence to support organizations in deciding whether their requirements are good enough or impede subsequent activities. We aim to contribute empirical evidence to the effect that requirements quality defects have on a software engineering activity that depends on this requirement. We conduct a controlled experiment in which 25 participants from industry and university generate domain models from four natural language requirements containing different quality defects. We evaluate the resulting models using both frequentist and Bayesian data analysis. Contrary to our expectations, our results show that the use of passive voice only has a minor impact on the resulting domain models. The use of ambiguous pronouns, however, shows a strong effect on various properties of the resulting domain models. Most notably, ambiguous pronouns lead to incorrect associations in domain models. Despite being equally advised against by literature and frequentist methods, the Bayesian data analysis shows that the two investigated quality defects have vastly different impacts on software engineering activities and, hence, deserve different levels of attention. Our employed method can be further utilized by researchers to improve reliable, detailed empirical evidence on requirements quality.
△ Less
Submitted 13 September, 2024; v1 submitted 2 January, 2024;
originally announced January 2024.
-
Towards Causal Analysis of Empirical Software Engineering Data: The Impact of Programming Languages on Coding Competitions
Authors:
Carlo A. Furia,
Richard Torkar,
Robert Feldt
Abstract:
There is abundant observational data in the software engineering domain, whereas running large-scale controlled experiments is often practically impossible. Thus, most empirical studies can only report statistical correlations -- instead of potentially more insightful and robust causal relations. To support analyzing purely observational data for causal relations, and to assess any differences bet…
▽ More
There is abundant observational data in the software engineering domain, whereas running large-scale controlled experiments is often practically impossible. Thus, most empirical studies can only report statistical correlations -- instead of potentially more insightful and robust causal relations. To support analyzing purely observational data for causal relations, and to assess any differences between purely predictive and causal models of the same data, this paper discusses some novel techniques based on structural causal models (such as directed acyclic graphs of causal Bayesian networks). Using these techniques, one can rigorously express, and partially validate, causal hypotheses; and then use the causal information to guide the construction of a statistical model that captures genuine causal relations -- such that correlation does imply causation. We apply these ideas to analyzing public data about programmer performance in Code Jam, a large world-wide coding contest organized by Google every year. Specifically, we look at the impact of different programming languages on a participant's performance in the contest. While the overall effect associated with programming languages is weak compared to other variables -- regardless of whether we consider correlational or causal links -- we found considerable differences between a purely associational and a causal analysis of the very same data. The takeaway message is that even an imperfect causal analysis of observational data can help answer the salient research questions more precisely and more robustly than with just purely predictive techniques -- where genuine causal effects may be confounded.
△ Less
Submitted 1 September, 2023; v1 submitted 18 January, 2023;
originally announced January 2023.
-
The Broken Windows Theory Applies to Technical Debt
Authors:
William Levén,
Hampus Broman,
Terese Besker,
Richard Torkar
Abstract:
Context: The term technical debt (TD) describes the aggregation of sub-optimal solutions that serve to impede the evolution and maintenance of a system. Some claim that the broken windows theory (BWT), a concept borrowed from criminology, also applies to software development projects. The theory states that the presence of indications of previous crime (such as a broken window) will increase the l…
▽ More
Context: The term technical debt (TD) describes the aggregation of sub-optimal solutions that serve to impede the evolution and maintenance of a system. Some claim that the broken windows theory (BWT), a concept borrowed from criminology, also applies to software development projects. The theory states that the presence of indications of previous crime (such as a broken window) will increase the likelihood of further criminal activity; TD could be considered the broken windows of software systems.
Objective: To empirically investigate the causal relationship between the TD density of a system and the propensity of developers to introduce new TD during the extension of that system.
Method: The study used a mixed-methods research strategy consisting of a controlled experiment with an accompanying survey and follow-up interviews. The experiment had a total of 29 developers of varying experience levels completing system extension tasks in already existing systems with high or low TD density. Results: The analysis revealed significant effects of TD level on the subjects' tendency to re-implement (rather than reuse) functionality, choose non-descriptive variable names, and introduce other code smells identified by the software tool SonarQube, all with at least 95% credible intervals.
Conclusions: Three separate significant results along with a validating qualitative result combine to form substantial evidence of the BWT's existence in software engineering contexts. This study finds that existing TD can have a major impact on developers propensity to introduce new TD of various types during development.
△ Less
Submitted 25 December, 2023; v1 submitted 4 September, 2022;
originally announced September 2022.
-
Take a deep breath. Benefits of neuroplasticity practices for software developers and computer workers in a family of experiments
Authors:
Birgit Penzenstadler,
Richard Torkar,
Cristina Martinez Montes
Abstract:
Context. Computer workers in general, and software developers specifically, are under a high amount of stress due to continuous deadlines and, often, over-commitment. Objective. This study investigates the effects of a neuroplasticity practice, a specific breathing practice, on the attention awareness, well-being, perceived productivity, and self-efficacy of computer workers. Method. We created a…
▽ More
Context. Computer workers in general, and software developers specifically, are under a high amount of stress due to continuous deadlines and, often, over-commitment. Objective. This study investigates the effects of a neuroplasticity practice, a specific breathing practice, on the attention awareness, well-being, perceived productivity, and self-efficacy of computer workers. Method. We created a questionnaire mainly from existing, validated scales as entry and exit survey for data points for comparison before and after the intervention. The intervention was a 12-week program with a weekly live session that included a talk on a well-being topic and a facilitated group breathing session. During the intervention period, we solicited one daily journal note and one weekly well-being rating. We replicated the intervention in a similarly structured 8-week program. The data was analyzed using a Bayesian multi-level model for the quantitative part and thematic analysis for the qualitative part. Results. The intervention showed improvements in participants' experienced inner states despite an ongoing pandemic and intense outer circumstances for most. Over the course of the study, we found an improvement in the participants' ratings of how often they found themselves in good spirits as well as in a calm and relaxed state. We also aggregate a large number of deep inner reflections and growth processes that may not have surfaced for the participants without deliberate engagement in such a program. Conclusion. The data indicates usefulness and effectiveness of an intervention for computer workers in terms of increasing well-being and resilience. Everyone needs a way to deliberately relax, unplug, and recover. Breathing practice is a simple way to do so, and the results call for establishing a larger body of work to make this common practice.
△ Less
Submitted 3 March, 2022; v1 submitted 11 September, 2021;
originally announced September 2021.
-
Not All Requirements Prioritization Criteria Are Equal at All Times: A Quantitative Analysis
Authors:
Richard Berntsson Svensson,
Richard Torkar
Abstract:
Requirement prioritization is recognized as an important decision-making activity in requirements engineering and software development. Requirement prioritization is applied to determine which requirements should be implemented and released. In order to prioritize requirements, there are several approaches/techniques/tools that use different requirements prioritization criteria, which are often id…
▽ More
Requirement prioritization is recognized as an important decision-making activity in requirements engineering and software development. Requirement prioritization is applied to determine which requirements should be implemented and released. In order to prioritize requirements, there are several approaches/techniques/tools that use different requirements prioritization criteria, which are often identified by gut feeling instead of an in-depth analysis of which criteria are most important to use. Therefore, in this study we investigate which requirements prioritization criteria are most important to use in industry when determining which requirements are implemented and released, and if the importance of the criteria change depending on how far a requirement has reached in the development process. We conducted a quantitative study of one completed project from one software developing company by extracting 32,139 requirements prioritization decisions based on eight requirements prioritization criteria for 11,110 requirements. The results show that not all requirements prioritization criteria are equally important, and this change depending on how far a requirement has reached in the development process.
△ Less
Submitted 20 December, 2023; v1 submitted 13 April, 2021;
originally announced April 2021.
-
Applying Bayesian Analysis Guidelines to Empirical Software Engineering Data: The Case of Programming Languages and Code Quality
Authors:
Carlo A. Furia,
Richard Torkar,
Robert Feldt
Abstract:
Statistical analysis is the tool of choice to turn data into information, and then information into empirical knowledge. To be valid, the process that goes from data to knowledge should be supported by detailed, rigorous guidelines, which help ferret out issues with the data or model, and lead to qualified results that strike a reasonable balance between generality and practical relevance. Such gu…
▽ More
Statistical analysis is the tool of choice to turn data into information, and then information into empirical knowledge. To be valid, the process that goes from data to knowledge should be supported by detailed, rigorous guidelines, which help ferret out issues with the data or model, and lead to qualified results that strike a reasonable balance between generality and practical relevance. Such guidelines are being developed by statisticians to support the latest techniques for Bayesian data analysis. In this article, we frame these guidelines in a way that is apt to empirical research in software engineering.
To demonstrate the guidelines in practice, we apply them to reanalyze a GitHub dataset about code quality in different programming languages. The dataset's original analysis (Ray et al., 2014) and a critical reanalysis (Berger at al., 2019) have attracted considerable attention -- in no small part because they target a topic (the impact of different programming languages) on which strong opinions abound. The goals of our reanalysis are largely orthogonal to this previous work, as we are concerned with demonstrating, on data in an interesting domain, how to build a principled Bayesian data analysis and to showcase some of its benefits. In the process, we will also shed light on some critical aspects of the analyzed data and of the relationship between programming languages and code quality.
The high-level conclusions of our exercise will be that Bayesian statistical techniques can be applied to analyze software engineering data in a way that is principled, flexible, and leads to convincing results that inform the state of the art while highlighting the boundaries of its validity. The guidelines can support building solid statistical analyses and connecting their results, and hence help buttress continued progress in empirical software engineering research.
△ Less
Submitted 28 July, 2021; v1 submitted 29 January, 2021;
originally announced January 2021.
-
Measuring affective states from technical debt: A psychoempirical software engineering experiment
Authors:
Jesper Olsson,
Erik Risfelt,
Terese Besker,
Antonio Martini,
Richard Torkar
Abstract:
Software engineering is a human activity. Despite this, human aspects are under-represented in technical debt research, perhaps because they are challenging to evaluate.
This study's objective was to investigate the relationship between technical debt and affective states (feelings, emotions, and moods) from software practitioners. Forty participants (N = 40) from twelve companies took part in a…
▽ More
Software engineering is a human activity. Despite this, human aspects are under-represented in technical debt research, perhaps because they are challenging to evaluate.
This study's objective was to investigate the relationship between technical debt and affective states (feelings, emotions, and moods) from software practitioners. Forty participants (N = 40) from twelve companies took part in a mixed-methods approach, consisting of a repeated-measures (r = 5) experiment (n = 200), a survey, and semi-structured interviews.
The statistical analysis shows that different design smells (strong indicators of technical debt) negatively or positively impact affective states. From the qualitative data, it is clear that technical debt activates a substantial portion of the emotional spectrum and is psychologically taxing. Further, the practitioners' reactions to technical debt appear to fall in different levels of maturity.
We argue that human aspects in technical debt are important factors to consider, as they may result in, e.g., procrastination, apprehension, and burnout.
△ Less
Submitted 2 May, 2021; v1 submitted 22 September, 2020;
originally announced September 2020.
-
An empirical study of Linespots: A novel past-fault algorithm
Authors:
Maximilian Scholz,
Richard Torkar
Abstract:
This paper proposes the novel past-faults fault prediction algorithm Linespots, based on the Bugspots algorithm. We analyze the predictive performance and runtime of Linespots compared to Bugspots with an empirical study using the most significant self-built dataset as of now, including high-quality samples for validation. As a novelty in fault prediction, we use Bayesian data analysis and Directe…
▽ More
This paper proposes the novel past-faults fault prediction algorithm Linespots, based on the Bugspots algorithm. We analyze the predictive performance and runtime of Linespots compared to Bugspots with an empirical study using the most significant self-built dataset as of now, including high-quality samples for validation. As a novelty in fault prediction, we use Bayesian data analysis and Directed Acyclic Graphs to model the effects. We found consistent improvements in the predictive performance of Linespots over Bugspots for all seven evaluation metrics. We conclude that Linespots should be used over Bugspots in all cases where no real-time performance is necessary.
△ Less
Submitted 6 July, 2021; v1 submitted 18 July, 2020;
originally announced July 2020.
-
Pandemic Programming: How COVID-19 affects software developers and how their organizations can help
Authors:
Paul Ralph,
Sebastian Baltes,
Gianisa Adisaputri,
Richard Torkar,
Vladimir Kovalenko,
Marcos Kalinowski,
Nicole Novielli,
Shin Yoo,
Xavier Devroey,
Xin Tan,
Minghui Zhou,
Burak Turhan,
Rashina Hoda,
Hideaki Hata,
Gregorio Robles,
Amin Milani Fard,
Rana Alkadhi
Abstract:
Context. As a novel coronavirus swept the world in early 2020, thousands of software developers began working from home. Many did so on short notice, under difficult and stressful conditions. Objective. This study investigates the effects of the pandemic on developers' wellbeing and productivity. Method. A questionnaire survey was created mainly from existing, validated scales and translated into…
▽ More
Context. As a novel coronavirus swept the world in early 2020, thousands of software developers began working from home. Many did so on short notice, under difficult and stressful conditions. Objective. This study investigates the effects of the pandemic on developers' wellbeing and productivity. Method. A questionnaire survey was created mainly from existing, validated scales and translated into 12 languages. The data was analyzed using non-parametric inferential statistics and structural equation modeling. Results. The questionnaire received 2225 usable responses from 53 countries. Factor analysis supported the validity of the scales and the structural model achieved a good fit (CFI = 0.961, RMSEA = 0.051, SRMR = 0.067). Confirmatory results include: (1) the pandemic has had a negative effect on developers' wellbeing and productivity; (2) productivity and wellbeing are closely related; (3) disaster preparedness, fear related to the pandemic and home office ergonomics all affect wellbeing or productivity. Exploratory analysis suggests that: (1) women, parents and people with disabilities may be disproportionately affected; (2) different people need different kinds of support. Conclusions. To improve employee productivity, software companies should focus on maximizing employee wellbeing and improving the ergonomics of employees' home offices. Women, parents and disabled persons may require extra support.
△ Less
Submitted 20 July, 2020; v1 submitted 3 May, 2020;
originally announced May 2020.
-
Estimating Return on Investment for GUI Test Automation Tools
Authors:
Felix Dobslaw,
Robert Feldt,
David Michaelsson,
Patrick Haar,
Francisco G. de Oliveira Neto,
Richard Torkar
Abstract:
Automated graphical user interface (GUI) tests can reduce manual testing activities and increase test frequency. This motivates the conversion of manual test cases into automated GUI tests. However, it is not clear whether such automation is cost-effective given that GUI automation scripts add to the code base and demand maintenance as a system evolves. In this paper, we introduce a method for est…
▽ More
Automated graphical user interface (GUI) tests can reduce manual testing activities and increase test frequency. This motivates the conversion of manual test cases into automated GUI tests. However, it is not clear whether such automation is cost-effective given that GUI automation scripts add to the code base and demand maintenance as a system evolves. In this paper, we introduce a method for estimating maintenance cost and Return on Investment (ROI) for Automated GUI Testing (AGT). The method utilizes the existing source code change history and can be used for evaluation also of other testing or quality assurance automation technologies. We evaluate the method for a real-world, industrial software system and compare two fundamentally different AGT tools, namely Selenium and EyeAutomate, to estimate and compare their ROI. We also report on their defect-finding capabilities and usability. The quantitative data is complemented by interviews with employees at the case company. The method was successfully applied and estimated maintenance cost and ROI for both tools are reported. Overall, the study supports earlier results showing that implementation time is the leading cost for introducing AGT. The findings further suggest that while EyeAutomate tests are significantly faster to implement, Selenium tests require more of a programming background but less maintenance.
△ Less
Submitted 1 November, 2019; v1 submitted 8 July, 2019;
originally announced July 2019.
-
The Connection Between Burnout and Personality Types in Software Developers
Authors:
Emanuel Mellblom,
Isar Arason,
Lucas Gren,
Richard Torkar
Abstract:
This paper examines the connection between the Five Factor Model personality traits and burnout in software developers. This study aims to validate generalizations of findings in other fields. An online survey consisting of a miniaturized International Personality Item Pool questionnaire for measuring the Five Factor Model personality traits, and the Shirom-Melamed Burnout Measure for measuring bu…
▽ More
This paper examines the connection between the Five Factor Model personality traits and burnout in software developers. This study aims to validate generalizations of findings in other fields. An online survey consisting of a miniaturized International Personality Item Pool questionnaire for measuring the Five Factor Model personality traits, and the Shirom-Melamed Burnout Measure for measuring burnout, were distributed to open source developer mailing lists, obtaining 47 valid responses. The results from a Bayesian Linear Regression analysis indicate a strong link between neuroticism and burnout confirming previous work, while the other Five Factor Model traits were not adding power to the model. It is important to note that we did not investigate the quality of work in connection to personality, nor did we take any other confounding factors into account like, for example, teamwork. Nonetheless, employers could be aware of, and support, software developers with high neuroticism.
△ Less
Submitted 22 June, 2019;
originally announced June 2019.
-
The Unfulfilled Potential of Data-Driven Decision Making in Agile Software Development
Authors:
Richard Berntsson Svensson,
Robert Feldt,
Richard Torkar
Abstract:
With the general trend towards data-driven decision making (DDDM), organizations are looking for ways to use DDDM to improve their decisions. However, few studies have looked into the practitioners view of DDDM, in particular for agile organizations. In this paper we investigated the experiences of using DDDM, and how data can improve decision making. An emailed questionnaire was sent out to 124 i…
▽ More
With the general trend towards data-driven decision making (DDDM), organizations are looking for ways to use DDDM to improve their decisions. However, few studies have looked into the practitioners view of DDDM, in particular for agile organizations. In this paper we investigated the experiences of using DDDM, and how data can improve decision making. An emailed questionnaire was sent out to 124 industry practitioners in agile software developing companies, of which 84 answered. The results show that few practitioners indicated a widespread use of DDDM in their current decision making practices. The practitioners were more positive to its future use for higher-level and more general decision making, fairly positive to its use for requirements elicitation and prioritization decisions, while being less positive to its future use at the team level. The practitioners do see a lot of potential for DDDM in an agile context; however, currently unfulfilled.
△ Less
Submitted 8 April, 2019;
originally announced April 2019.
-
Group development and group maturity when building agile teams: A qualitative and quantitative investigation at eight large companies
Authors:
Lucas Gren,
Richard Torkar,
Robert Feldt
Abstract:
The agile approach to projects focuses more on close-knit teams than traditional waterfall projects, which means that aspects of group maturity become even more important. This psychological aspect is not much researched in connection to the building of an "agile team." The purpose of this study is to investigate how building agile teams is connected to a group development model taken from social…
▽ More
The agile approach to projects focuses more on close-knit teams than traditional waterfall projects, which means that aspects of group maturity become even more important. This psychological aspect is not much researched in connection to the building of an "agile team." The purpose of this study is to investigate how building agile teams is connected to a group development model taken from social psychology. We conducted ten semi-structured interviews with coaches, Scrum Masters, and managers responsible for the agile process from seven different companies, and collected survey data from 66 group-members from four companies (a total of eight different companies). The survey included an agile measurement tool and the one part of the Group Development Questionnaire. The results show that the practitioners define group developmental aspects as key factors to a successful agile transition. Also, the quantitative measurement of agility was significantly correlated to the group maturity measurement. We conclude that adding these psychological aspects to the description of the "agile team" could increase the understanding of agility and partly help define an "agile team." We propose that future work should develop specific guidelines for how software development teams at different maturity levels might adopt agile principles and practices differently.
△ Less
Submitted 4 April, 2019;
originally announced April 2019.
-
Group Maturity and Agility, Are They Connected? - A Survey Study
Authors:
Lucas Gren,
Richard Torkar,
Robert Feldt
Abstract:
The focus on psychology has increased within software engineering due to the project management innovation "agile development processes". The agile methods do not explicitly consider group development aspects; they simply assume what is described in group psychology as mature groups. This study was conducted with 45 employees and their twelve managers (N=57) from two SAP customers in the US that w…
▽ More
The focus on psychology has increased within software engineering due to the project management innovation "agile development processes". The agile methods do not explicitly consider group development aspects; they simply assume what is described in group psychology as mature groups. This study was conducted with 45 employees and their twelve managers (N=57) from two SAP customers in the US that were working with agile methods, and the data were collected via an online survey. The selected Agility measurement was correlated to a Group Development measurement and showed significant convergent validity, i.e., a more mature team is also a more agile team. This means that the agile methods probably would benefit from taking group development into account when its practices are being introduced.
△ Less
Submitted 4 April, 2019;
originally announced April 2019.
-
The prospects of a quantitative measurement of agility: A validation study on an agile maturity model
Authors:
Lucas Gren,
Richard Torkar,
Robert Feldt
Abstract:
Agile development has now become a well-known approach to collaboration in professional work life. Both researchers and practitioners want validated tools to measure agility. This study sets out to validate an agile maturity measurement model with statistical tests and empirical data. First, a pretest was conducted as a case study including a survey and focus group. Second, the main study was cond…
▽ More
Agile development has now become a well-known approach to collaboration in professional work life. Both researchers and practitioners want validated tools to measure agility. This study sets out to validate an agile maturity measurement model with statistical tests and empirical data. First, a pretest was conducted as a case study including a survey and focus group. Second, the main study was conducted with 45 employees from two SAP customers in the US. We used internal consistency (by a Cronbach's alpha) as the main measure for reliability and analyzed construct validity by exploratory principal factor analysis (PFA). The results suggest a new categorization of a subset of items existing in the tool and provides empirical support for these new groups of factors. However, we argue that more work is needed to reach the point where a maturity models with quantitative data can be said to validly measure agility, and even then, such a measurement still needs to include some deeper analysis with cultural and contextual items.
△ Less
Submitted 4 April, 2019;
originally announced April 2019.
-
Work Motivational Challenges Regarding the Interface Between Agile Teams and a Non-Agile Surrounding Organization: A case study
Authors:
Lucas Gren,
Richard Torkar,
Robert Feldt
Abstract:
There are studies showing what happens if agile teams are introduced into a non-agile organization, e.g. higher overhead costs and the necessity of an understanding of agile methods even outside the teams. This case study shows an example of work motivational aspects that might surface when an agile team exists in the middle of a more traditional structure. This case study was conducted at a car m…
▽ More
There are studies showing what happens if agile teams are introduced into a non-agile organization, e.g. higher overhead costs and the necessity of an understanding of agile methods even outside the teams. This case study shows an example of work motivational aspects that might surface when an agile team exists in the middle of a more traditional structure. This case study was conducted at a car manufacturer in Sweden, consisting of an unstructured interview with the Scrum Master and a semi-structured focus group. The results show that the teams felt that the feedback from the surrounding organization was unsynchronized resulting in them not feeling appreciated when delivering their work. Moreover, they felt frustrated when working on non-agile teams after have been working on agile ones. This study concludes that there were work motivational affects of fitting an agile team into a non-agile surrounding organization, and therefore this might also be true for other organizations.
△ Less
Submitted 4 April, 2019;
originally announced April 2019.
-
Bayesian data analysis in empirical software engineering---The case of missing data
Authors:
Richard Torkar,
Robert Feldt,
Carlo A. Furia
Abstract:
Bayesian data analysis (BDA) is today used by a multitude of research disciplines. These disciplines use BDA as a way to embrace uncertainty by using multilevel models and making use of all available information at hand. In this chapter, we first introduce the reader to BDA and then provide an example from empirical software engineering, where we also deal with a common issue in our field, i.e., m…
▽ More
Bayesian data analysis (BDA) is today used by a multitude of research disciplines. These disciplines use BDA as a way to embrace uncertainty by using multilevel models and making use of all available information at hand. In this chapter, we first introduce the reader to BDA and then provide an example from empirical software engineering, where we also deal with a common issue in our field, i.e., missing data.
The example we make use of presents the steps done when conducting state of the art statistical analysis. First, we need to understand the problem we want to solve. Second, we conduct causal analysis. Third, we analyze non-identifiability. Fourth, we conduct missing data analysis. Finally, we do a sensitivity analysis of priors. All this before we design our statistical model. Once we have a model, we present several diagnostics one can use to conduct sanity checks.
We hope that through these examples, the reader will see the advantages of using BDA. This way, we hope Bayesian statistics will become more prevalent in our field, thus partly avoiding the reproducibility crisis we have seen in other disciplines.
△ Less
Submitted 1 January, 2020; v1 submitted 1 April, 2019;
originally announced April 2019.
-
Bayesian Data Analysis in Empirical Software Engineering Research
Authors:
Carlo A. Furia,
Robert Feldt,
Richard Torkar
Abstract:
Statistics comes in two main flavors: frequentist and Bayesian. For historical and technical reasons, frequentist statistics have traditionally dominated empirical data analysis, and certainly remain prevalent in empirical software engineering. This situation is unfortunate because frequentist statistics suffer from a number of shortcomings---such as lack of flexibility and results that are unintu…
▽ More
Statistics comes in two main flavors: frequentist and Bayesian. For historical and technical reasons, frequentist statistics have traditionally dominated empirical data analysis, and certainly remain prevalent in empirical software engineering. This situation is unfortunate because frequentist statistics suffer from a number of shortcomings---such as lack of flexibility and results that are unintuitive and hard to interpret---that curtail their effectiveness when dealing with the heterogeneous data that is increasingly available for empirical analysis of software engineering practice.
In this paper, we pinpoint these shortcomings, and present Bayesian data analysis techniques that provide tangible benefits---as they can provide clearer results that are simultaneously robust and nuanced. After a short, high-level introduction to the basic tools of Bayesian statistics, we present the reanalysis of two empirical studies on the effectiveness of automatically generated tests and the performance of programming languages. By contrasting the original frequentist analyses with our new Bayesian analyses, we demonstrate the concrete advantages of the latter. To conclude we advocate a more prominent role for Bayesian statistical techniques in empirical software engineering research and practice.
△ Less
Submitted 26 August, 2019; v1 submitted 13 November, 2018;
originally announced November 2018.
-
A Method to Assess and Argue for Practical Significance in Software Engineering
Authors:
Richard Torkar,
Carlo A. Furia,
Robert Feldt,
Francisco Gomes de Oliveira Neto,
Lucas Gren,
Per Lenberg,
Neil A. Ernst
Abstract:
A key goal of empirical research in software engineering is to assess practical significance, which answers whether the observed effects of some compared treatments show a relevant difference in practice in realistic scenarios. Even though plenty of standard techniques exist to assess statistical significance, connecting it to practical significance is not straightforward or routinely done; indeed…
▽ More
A key goal of empirical research in software engineering is to assess practical significance, which answers whether the observed effects of some compared treatments show a relevant difference in practice in realistic scenarios. Even though plenty of standard techniques exist to assess statistical significance, connecting it to practical significance is not straightforward or routinely done; indeed, only a few empirical studies in software engineering assess practical significance in a principled and systematic way.
In this paper, we argue that Bayesian data analysis provides suitable tools to assess practical significance rigorously. We demonstrate our claims in a case study comparing different test techniques. The case study's data was previously analyzed (Afzal et al., 2015) using standard techniques focusing on statistical significance. Here, we build a multilevel model of the same data, which we fit and validate using Bayesian techniques. Our method is to apply cumulative prospect theory on top of the statistical model to quantitatively connect our statistical analysis output to a practically meaningful context. This is then the basis both for assessing and arguing for practical significance.
Our study demonstrates that Bayesian analysis provides a technically rigorous yet practical framework for empirical software engineering. A substantial side effect is that any uncertainty in the underlying data will be propagated through the statistical model, and its effects on practical significance are made clear.
Thus, in combination with cumulative prospect theory, Bayesian analysis supports seamlessly assessing practical significance in an empirical software engineering context, thus potentially clarifying and extending the relevance of research for practitioners.
△ Less
Submitted 25 December, 2020; v1 submitted 26 September, 2018;
originally announced September 2018.
-
Transferring Interactive Search-Based Software Testing to Industry
Authors:
Bogdan Marculescu,
Robert Feldt,
Richard Torkar,
Simon Poulding
Abstract:
Search-Based Software Testing (SBST) is the application of optimization algorithms to problems in software testing. In previous work, we have implemented and evaluated Interactive Search-Based Software Testing (ISBST) tool prototypes, with a goal to successfully transfer the technique to industry. While SBSE solutions are often validated on benchmark problems, there is a need to validate them in a…
▽ More
Search-Based Software Testing (SBST) is the application of optimization algorithms to problems in software testing. In previous work, we have implemented and evaluated Interactive Search-Based Software Testing (ISBST) tool prototypes, with a goal to successfully transfer the technique to industry. While SBSE solutions are often validated on benchmark problems, there is a need to validate them in an operational setting. The present paper discusses the development and deployment of SBST tools for use in industry and reflects on the transfer of these techniques to industry. In addition to previous work discussing the development and validation of an ISBST prototype, a new version of the prototype ISBST system was evaluated in the laboratory and in industry. This evaluation is based on an industrial System under Test (SUT) and was carried out with industrial practitioners. The Technology Transfer Model is used as a framework to describe the progression of the development and evaluation of the ISBST system. The paper presents a synthesis of previous work developing and evaluating the ISBST prototype, as well as presenting an evaluation, in both academia and industry, of that prototype's latest version. This paper presents an overview of the development and deployment of the ISBST system in an industrial setting, using the framework of the Technology Transfer Model. We conclude that the ISBST system is capable of evolving useful test cases for that setting, though improvements in the means the system uses to communicate that information to the user are still required. In addition, a set of lessons learned from the project are listed and discussed. Our objective is to help other researchers that wish to validate search-based systems in industry and provide more information about the benefits and drawbacks of these systems.
△ Less
Submitted 24 April, 2018;
originally announced April 2018.
-
A Testability Analysis Framework for Non-Functional Properties
Authors:
Michael Felderer,
Bogdan Marculescu,
Francisco Gomes de Oliveira Neto,
Robert Feldt,
Richard Torkar
Abstract:
This paper presents background, the basic steps and an example for a testability analysis framework for non-functional properties.
This paper presents background, the basic steps and an example for a testability analysis framework for non-functional properties.
△ Less
Submitted 20 February, 2018;
originally announced February 2018.
-
Ways of Applying Artificial Intelligence in Software Engineering
Authors:
Robert Feldt,
Francisco G. de Oliveira Neto,
Richard Torkar
Abstract:
As Artificial Intelligence (AI) techniques have become more powerful and easier to use they are increasingly deployed as key components of modern software systems. While this enables new functionality and often allows better adaptation to user needs it also creates additional problems for software engineers and exposes companies to new risks. Some work has been done to better understand the intera…
▽ More
As Artificial Intelligence (AI) techniques have become more powerful and easier to use they are increasingly deployed as key components of modern software systems. While this enables new functionality and often allows better adaptation to user needs it also creates additional problems for software engineers and exposes companies to new risks. Some work has been done to better understand the interaction between Software Engineering and AI but we lack methods to classify ways of applying AI in software systems and to analyse and understand the risks this poses. Only by doing so can we devise tools and solutions to help mitigate them. This paper presents the AI in SE Application Levels (AI-SEAL) taxonomy that categorises applications according to their point of AI application, the type of AI technology used and the automation level allowed. We show the usefulness of this taxonomy by classifying 15 papers from previous editions of the RAISE workshop. Results show that the taxonomy allows classification of distinct AI applications and provides insights concerning the risks associated with them. We argue that this will be important for companies in deciding how to apply AI in their software applications and to create strategies for its use.
△ Less
Submitted 7 February, 2018; v1 submitted 6 February, 2018;
originally announced February 2018.
-
Evolution of statistical analysis in empirical software engineering research: Current state and steps forward
Authors:
Francisco Gomes de Oliveira Neto,
Richard Torkar,
Robert Feldt,
Lucas Gren,
Carlo A. Furia,
Ziwei Huang
Abstract:
Software engineering research is evolving and papers are increasingly based on empirical data from a multitude of sources, using statistical tests to determine if and to what degree empirical evidence supports their hypotheses. To investigate the practices and trends of statistical analysis in empirical software engineering (ESE), this paper presents a review of a large pool of papers from top-ran…
▽ More
Software engineering research is evolving and papers are increasingly based on empirical data from a multitude of sources, using statistical tests to determine if and to what degree empirical evidence supports their hypotheses. To investigate the practices and trends of statistical analysis in empirical software engineering (ESE), this paper presents a review of a large pool of papers from top-ranked software engineering journals. First, we manually reviewed 161 papers and in the second phase of our method, we conducted a more extensive semi-automatic classification of papers spanning the years 2001--2015 and 5,196 papers. Results from both review steps was used to: i) identify and analyze the predominant practices in ESE (e.g., using t-test or ANOVA), as well as relevant trends in usage of specific statistical methods (e.g., nonparametric tests and effect size measures) and, ii) develop a conceptual model for a statistical analysis workflow with suggestions on how to apply different statistical methods as well as guidelines to avoid pitfalls. Lastly, we confirm existing claims that current ESE practices lack a standard to report practical significance of results. We illustrate how practical significance can be discussed in terms of both the statistical analysis and in the practitioner's context.
△ Less
Submitted 10 July, 2019; v1 submitted 3 June, 2017;
originally announced June 2017.
-
Tester Interactivity makes a Difference in Search-Based Software Testing: A Controlled Experiment
Authors:
Bogdan Marculescu,
Simon Poulding,
Robert Feldt,
Kai Petersen,
Richard Torkar
Abstract:
Context: Search-based software testing promises to provide users with the ability to generate high-quality test cases, and hence increase product quality, with a minimal increase in the time and effort required. One result that emerged out of a previous study to investigate the application of search-based software testing (SBST) in an industrial setting was the development of the Interactive Searc…
▽ More
Context: Search-based software testing promises to provide users with the ability to generate high-quality test cases, and hence increase product quality, with a minimal increase in the time and effort required. One result that emerged out of a previous study to investigate the application of search-based software testing (SBST) in an industrial setting was the development of the Interactive Search-Based Software Testing (ISBST) system. ISBST allows users to interact with the underlying SBST system, guiding the search and assessing the results. An industrial evaluation indicated that the ISBST system could find test cases that are not created by testers employing manual techniques. The validity of the evaluation was threatened, however, by the low number of participants.
Objective: This paper presents a follow-up study, to provide a more rigorous evaluation of the ISBST system.
Method: To assess the ISBST system a two-way crossover controlled experiment was conducted with 58 students taking a Verification and Validation course. The NASA Task Load Index (NASA-TLX) is used to assess the workload experienced by the participants in the experiment.
Results: The experimental results validated the hypothesis that the ISBST system generates test cases that are not found by the same participants employing manual testing techniques. A follow-up laboratory experiment also investigates the importance of interaction in obtaining the results. In addition to this main result, the subjective workload was assessed for each participant by means of the NASA-TLX tool. The evaluation showed that, while the ISBST system required more effort from the participants, they achieved the same performance.
Conclusions: The paper provides evidence that the ISBST system develops test cases that are not found by manual techniques, and that interaction plays an important role in achieving that result.
△ Less
Submitted 15 December, 2015;
originally announced December 2015.