-
Beyond the Comfort Zone: Emerging Solutions to Overcome Challenges in Integrating LLMs into Software Products
Authors:
Nadia Nahar,
Christian Kästner,
Jenna Butler,
Chris Parnin,
Thomas Zimmermann,
Christian Bird
Abstract:
Large Language Models (LLMs) are increasingly embedded into software products across diverse industries, enhancing user experiences, but at the same time introducing numerous challenges for developers. Unique characteristics of LLMs force developers, who are accustomed to traditional software development and evaluation, out of their comfort zones as the LLM components shatter standard assumptions…
▽ More
Large Language Models (LLMs) are increasingly embedded into software products across diverse industries, enhancing user experiences, but at the same time introducing numerous challenges for developers. Unique characteristics of LLMs force developers, who are accustomed to traditional software development and evaluation, out of their comfort zones as the LLM components shatter standard assumptions about software systems. This study explores the emerging solutions that software developers are adopting to navigate the encountered challenges. Leveraging a mixed-method research, including 26 interviews and a survey with 332 responses, the study identifies 19 emerging solutions regarding quality assurance that practitioners across several product teams at Microsoft are exploring. The findings provide valuable insights that can guide the development and evaluation of LLM-based products more broadly in the face of these challenges.
△ Less
Submitted 15 October, 2024;
originally announced October 2024.
-
What Is Wrong with My Model? Identifying Systematic Problems with Semantic Data Slicing
Authors:
Chenyang Yang,
Yining Hong,
Grace A. Lewis,
Tongshuang Wu,
Christian Kästner
Abstract:
Machine learning models make mistakes, yet sometimes it is difficult to identify the systematic problems behind the mistakes. Practitioners engage in various activities, including error analysis, testing, auditing, and red-teaming, to form hypotheses of what can go (or has gone) wrong with their models. To validate these hypotheses, practitioners employ data slicing to identify relevant examples.…
▽ More
Machine learning models make mistakes, yet sometimes it is difficult to identify the systematic problems behind the mistakes. Practitioners engage in various activities, including error analysis, testing, auditing, and red-teaming, to form hypotheses of what can go (or has gone) wrong with their models. To validate these hypotheses, practitioners employ data slicing to identify relevant examples. However, traditional data slicing is limited by available features and programmatic slicing functions. In this work, we propose SemSlicer, a framework that supports semantic data slicing, which identifies a semantically coherent slice, without the need for existing features. SemSlicer uses Large Language Models to annotate datasets and generate slices from any user-defined slicing criteria. We show that SemSlicer generates accurate slices with low cost, allows flexible trade-offs between different design dimensions, reliably identifies under-performing data slices, and helps practitioners identify useful data slices that reflect systematic problems.
△ Less
Submitted 13 September, 2024;
originally announced September 2024.
-
S3C2 Summit 2023-11: Industry Secure Supply Chain Summit
Authors:
Nusrat Zahan,
Yasemin Acar,
Michel Cukier,
William Enck,
Christian Kästner,
Alexandros Kapravelos,
Dominik Wermke,
Laurie Williams
Abstract:
Cyber attacks leveraging or targeting the software supply chain, such as the SolarWinds and the Log4j incidents, affected thousands of businesses and their customers, drawing attention from both industry and government stakeholders. To foster open dialogue, facilitate mutual sharing, and discuss shared challenges encountered by stakeholders in securing their software supply chain, researchers from…
▽ More
Cyber attacks leveraging or targeting the software supply chain, such as the SolarWinds and the Log4j incidents, affected thousands of businesses and their customers, drawing attention from both industry and government stakeholders. To foster open dialogue, facilitate mutual sharing, and discuss shared challenges encountered by stakeholders in securing their software supply chain, researchers from the NSF-supported Secure Software Supply Chain Center (S3C2) organize Secure Supply Chain Summits with stakeholders. This paper summarizes the Industry Secure Supply Chain Summit held on November 16, 2023, which consisted of \panels{} panel discussions with a diverse set of \participants{} practitioners from the industry. The individual panels were framed with open-ended questions and included the topics of Software Bills of Materials (SBOMs), vulnerable dependencies, malicious commits, build and deploy infrastructure, reducing entire classes of vulnerabilities at scale, and supporting a company culture conductive to securing the software supply chain. The goal of this summit was to enable open discussions, mutual sharing, and shedding light on common challenges that industry practitioners with practical experience face when securing their software supply chain.
△ Less
Submitted 29 August, 2024;
originally announced August 2024.
-
Training on the Edge of Stability Is Caused by Layerwise Jacobian Alignment
Authors:
Mark Lowell,
Catharine Kastner
Abstract:
During neural network training, the sharpness of the Hessian matrix of the training loss rises until training is on the edge of stability. As a result, even nonstochastic gradient descent does not accurately model the underlying dynamical system defined by the gradient flow of the training loss. We use an exponential Euler solver to train the network without entering the edge of stability, so that…
▽ More
During neural network training, the sharpness of the Hessian matrix of the training loss rises until training is on the edge of stability. As a result, even nonstochastic gradient descent does not accurately model the underlying dynamical system defined by the gradient flow of the training loss. We use an exponential Euler solver to train the network without entering the edge of stability, so that we accurately approximate the true gradient descent dynamics. We demonstrate experimentally that the increase in the sharpness of the Hessian matrix is caused by the layerwise Jacobian matrices of the network becoming aligned, so that a small change in the network preactivations near the inputs of the network can cause a large change in the outputs of the network. We further demonstrate that the degree of alignment scales with the size of the dataset by a power law with a coefficient of determination between 0.74 and 0.98.
△ Less
Submitted 31 May, 2024;
originally announced June 2024.
-
S3C2 Summit 2024-03: Industry Secure Supply Chain Summit
Authors:
Greg Tystahl,
Yasemin Acar,
Michel Cukier,
William Enck,
Christian Kastner,
Alexandros Kapravelos,
Dominik Wermke,
Laurie Williams
Abstract:
Supply chain security has become a very important vector to consider when defending against adversary attacks. Due to this, more and more developers are keen on improving their supply chains to make them more robust against future threats. On March 7th, 2024 researchers from the Secure Software Supply Chain Center (S3C2) gathered 14 industry leaders, developers and consumers of the open source eco…
▽ More
Supply chain security has become a very important vector to consider when defending against adversary attacks. Due to this, more and more developers are keen on improving their supply chains to make them more robust against future threats. On March 7th, 2024 researchers from the Secure Software Supply Chain Center (S3C2) gathered 14 industry leaders, developers and consumers of the open source ecosystem to discuss the state of supply chain security. The goal of the summit is to share insights between companies and developers alike to foster new collaborations and ideas moving forward. Through this meeting, participants were questions on best practices and thoughts how to improve things for the future. In this paper we summarize the responses and discussions of the summit. The panel questions can be found in the appendix.
△ Less
Submitted 14 May, 2024;
originally announced May 2024.
-
(Why) Is My Prompt Getting Worse? Rethinking Regression Testing for Evolving LLM APIs
Authors:
Wanqin Ma,
Chenyang Yang,
Christian Kästner
Abstract:
Large Language Models (LLMs) are increasingly integrated into software applications. Downstream application developers often access LLMs through APIs provided as a service. However, LLM APIs are often updated silently and scheduled to be deprecated, forcing users to continuously adapt to evolving models. This can cause performance regression and affect prompt design choices, as evidenced by our ca…
▽ More
Large Language Models (LLMs) are increasingly integrated into software applications. Downstream application developers often access LLMs through APIs provided as a service. However, LLM APIs are often updated silently and scheduled to be deprecated, forcing users to continuously adapt to evolving models. This can cause performance regression and affect prompt design choices, as evidenced by our case study on toxicity detection. Based on our case study, we emphasize the need for and re-examine the concept of regression testing for evolving LLM APIs. We argue that regression testing LLMs requires fundamental changes to traditional testing approaches, due to different correctness notions, prompting brittleness, and non-determinism in LLM APIs.
△ Less
Submitted 6 February, 2024; v1 submitted 18 November, 2023;
originally announced November 2023.
-
Beyond Testers' Biases: Guiding Model Testing with Knowledge Bases using LLMs
Authors:
Chenyang Yang,
Rishabh Rustogi,
Rachel Brower-Sinning,
Grace A. Lewis,
Christian Kästner,
Tongshuang Wu
Abstract:
Current model testing work has mostly focused on creating test cases. Identifying what to test is a step that is largely ignored and poorly supported. We propose Weaver, an interactive tool that supports requirements elicitation for guiding model testing. Weaver uses large language models to generate knowledge bases and recommends concepts from them interactively, allowing testers to elicit requir…
▽ More
Current model testing work has mostly focused on creating test cases. Identifying what to test is a step that is largely ignored and poorly supported. We propose Weaver, an interactive tool that supports requirements elicitation for guiding model testing. Weaver uses large language models to generate knowledge bases and recommends concepts from them interactively, allowing testers to elicit requirements for further testing. Weaver provides rich external knowledge to testers and encourages testers to systematically explore diverse concepts beyond their own biases. In a user study, we show that both NLP experts and non-experts identified more, as well as more diverse concepts worth testing when using Weaver. Collectively, they found more than 200 failing test cases for stance detection with zero-shot ChatGPT. Our case studies further show that Weaver can help practitioners test models in real-world settings, where developers define more nuanced application scenarios (e.g., code understanding and transcript summarization) using LLMs.
△ Less
Submitted 14 October, 2023;
originally announced October 2023.
-
S3C2 Summit 2023-06: Government Secure Supply Chain Summit
Authors:
William Enck,
Yasemin Acar,
Michel Cukier,
Alexandros Kapravelos,
Christian Kästner,
Laurie Williams
Abstract:
Recent years have shown increased cyber attacks targeting less secure elements in the software supply chain and causing fatal damage to businesses and organizations. Past well-known examples of software supply chain attacks are the SolarWinds or log4j incidents that have affected thousands of customers and businesses. The US government and industry are equally interested in enhancing software supp…
▽ More
Recent years have shown increased cyber attacks targeting less secure elements in the software supply chain and causing fatal damage to businesses and organizations. Past well-known examples of software supply chain attacks are the SolarWinds or log4j incidents that have affected thousands of customers and businesses. The US government and industry are equally interested in enhancing software supply chain security. On June 7, 2023, researchers from the NSF-supported Secure Software Supply Chain Center (S3C2) conducted a Secure Software Supply Chain Summit with a diverse set of 17 practitioners from 13 government agencies. The goal of the Summit was two-fold: (1) to share our observations from our previous two summits with industry, and (2) to enable sharing between individuals at the government agencies regarding practical experiences and challenges with software supply chain security. For each discussion topic, we presented our observations and take-aways from the industry summits to spur conversation. We specifically focused on the Executive Order 14028, software bill of materials (SBOMs), choosing new dependencies, provenance and self-attestation, and large language models. The open discussions enabled mutual sharing and shed light on common challenges that government agencies see as impacting government and industry practitioners when securing their software supply chain. In this paper, we provide a summary of the Summit.
△ Less
Submitted 13 August, 2023;
originally announced August 2023.
-
The Product Beyond the Model -- An Empirical Study of Repositories of Open-Source ML Products
Authors:
Nadia Nahar,
Haoran Zhang,
Grace Lewis,
Shurui Zhou,
Christian Kästner
Abstract:
Machine learning (ML) components are increasingly incorporated into software products for end-users, but developers face challenges in transitioning from ML prototypes to products. Academics have limited access to the source of commercial ML products, hindering research progress to address these challenges. In this study, first and foremost, we contribute a dataset of 262 open-source ML products f…
▽ More
Machine learning (ML) components are increasingly incorporated into software products for end-users, but developers face challenges in transitioning from ML prototypes to products. Academics have limited access to the source of commercial ML products, hindering research progress to address these challenges. In this study, first and foremost, we contribute a dataset of 262 open-source ML products for end users (not just models), identified among more than half a million ML-related projects on GitHub. Then, we qualitatively and quantitatively analyze 30 open-source ML products to answer six broad research questions about development practices and system architecture. We find that the majority of the ML products in our sample represent more startup-style development than reported in past interview studies. We report 21 findings, including limited involvement of data scientists in many open-source ML products, unusually low modularity between ML and non-ML code, diverse architectural choices on incorporating models into products, and limited prevalence of industry best practices such as model testing, pipeline automation, and monitoring. Additionally, we discuss seven implications of this study on research, development, and education, including the need for tools to assist teams without data scientists, education opportunities, and open-source-specific research for privacy-preserving telemetry.
△ Less
Submitted 15 August, 2024; v1 submitted 8 August, 2023;
originally announced August 2023.
-
S3C2 Summit 2023-02: Industry Secure Supply Chain Summit
Authors:
Trevor Dunlap,
Yasemin Acar,
Michel Cucker,
William Enck,
Alexandros Kapravelos,
Christian Kastner,
Laurie Williams
Abstract:
Recent years have shown increased cyber attacks targeting less secure elements in the software supply chain and causing fatal damage to businesses and organizations. Past well-known examples of software supply chain attacks are the SolarWinds or log4j incidents that have affected thousands of customers and businesses. The US government and industry are equally interested in enhancing software supp…
▽ More
Recent years have shown increased cyber attacks targeting less secure elements in the software supply chain and causing fatal damage to businesses and organizations. Past well-known examples of software supply chain attacks are the SolarWinds or log4j incidents that have affected thousands of customers and businesses. The US government and industry are equally interested in enhancing software supply chain security. On February 22, 2023, researchers from the NSF-supported Secure Software Supply Chain Center (S3C2) conducted a Secure Software Supply Chain Summit with a diverse set of 17 practitioners from 15 companies. The goal of the Summit is to enable sharing between industry practitioners having practical experiences and challenges with software supply chain security and helping to form new collaborations. We conducted six-panel discussions based upon open-ended questions regarding software bill of materials (SBOMs), malicious commits, choosing new dependencies, build and deploy,the Executive Order 14028, and vulnerable dependencies. The open discussions enabled mutual sharing and shed light on common challenges that industry practitioners with practical experience face when securing their software supply chain. In this paper, we provide a summary of the Summit. Full panel questions can be found in the appendix.
△ Less
Submitted 31 July, 2023;
originally announced July 2023.
-
S3C2 Summit 2202-09: Industry Secure Suppy Chain Summit
Authors:
Mindy Tran,
Yasemin Acar,
Michel Cucker,
William Enck,
Alexandros Kapravelos,
Christian Kastner,
Laurie Williams
Abstract:
Recent years have shown increased cyber attacks targeting less secure elements in the software supply chain and causing fatal damage to businesses and organizations. Past well-known examples of software supply chain attacks are the SolarWinds or log4j incidents that have affected thousands of customers and businesses. The US government and industry are equally interested in enhancing software supp…
▽ More
Recent years have shown increased cyber attacks targeting less secure elements in the software supply chain and causing fatal damage to businesses and organizations. Past well-known examples of software supply chain attacks are the SolarWinds or log4j incidents that have affected thousands of customers and businesses. The US government and industry are equally interested in enhancing software supply chain security. We conducted six panel discussions with a diverse set of 19 practitioners from industry. We asked them open-ended questions regarding SBOMs, vulnerable dependencies, malicious commits, build and deploy, the Executive Order, and standards compliance. The goal of this summit was to enable open discussions, mutual sharing, and shedding light on common challenges that industry practitioners with practical experience face when securing their software supply chain. This paper summarizes the summit held on September 30, 2022.
△ Less
Submitted 28 July, 2023;
originally announced July 2023.
-
A Meta-Summary of Challenges in Building Products with ML Components -- Collecting Experiences from 4758+ Practitioners
Authors:
Nadia Nahar,
Haoran Zhang,
Grace Lewis,
Shurui Zhou,
Christian Kästner
Abstract:
Incorporating machine learning (ML) components into software products raises new software-engineering challenges and exacerbates existing challenges. Many researchers have invested significant effort in understanding the challenges of industry practitioners working on building products with ML components, through interviews and surveys with practitioners. With the intention to aggregate and presen…
▽ More
Incorporating machine learning (ML) components into software products raises new software-engineering challenges and exacerbates existing challenges. Many researchers have invested significant effort in understanding the challenges of industry practitioners working on building products with ML components, through interviews and surveys with practitioners. With the intention to aggregate and present their collective findings, we conduct a meta-summary study: We collect 50 relevant papers that together interacted with over 4758 practitioners using guidelines for systematic literature reviews. We then collected, grouped, and organized the over 500 mentions of challenges within those papers. We highlight the most commonly reported challenges and hope this meta-summary will be a useful resource for the research community to prioritize research and education in this field.
△ Less
Submitted 31 March, 2023;
originally announced April 2023.
-
MLTEing Models: Negotiating, Evaluating, and Documenting Model and System Qualities
Authors:
Katherine R. Maffey,
Kyle Dotterrer,
Jennifer Niemann,
Iain Cruickshank,
Grace A. Lewis,
Christian Kästner
Abstract:
Many organizations seek to ensure that machine learning (ML) and artificial intelligence (AI) systems work as intended in production but currently do not have a cohesive methodology in place to do so. To fill this gap, we propose MLTE (Machine Learning Test and Evaluation, colloquially referred to as "melt"), a framework and implementation to evaluate ML models and systems. The framework compiles…
▽ More
Many organizations seek to ensure that machine learning (ML) and artificial intelligence (AI) systems work as intended in production but currently do not have a cohesive methodology in place to do so. To fill this gap, we propose MLTE (Machine Learning Test and Evaluation, colloquially referred to as "melt"), a framework and implementation to evaluate ML models and systems. The framework compiles state-of-the-art evaluation techniques into an organizational process for interdisciplinary teams, including model developers, software engineers, system owners, and other stakeholders. MLTE tooling supports this process by providing a domain-specific language that teams can use to express model requirements, an infrastructure to define, generate, and collect ML evaluation metrics, and the means to communicate results.
△ Less
Submitted 3 March, 2023;
originally announced March 2023.
-
Capabilities for Better ML Engineering
Authors:
Chenyang Yang,
Rachel Brower-Sinning,
Grace A. Lewis,
Christian Kästner,
Tongshuang Wu
Abstract:
In spite of machine learning's rapid growth, its engineering support is scattered in many forms, and tends to favor certain engineering stages, stakeholders, and evaluation preferences. We envision a capability-based framework, which uses fine-grained specifications for ML model behaviors to unite existing efforts towards better ML engineering. We use concrete scenarios (model design, debugging, a…
▽ More
In spite of machine learning's rapid growth, its engineering support is scattered in many forms, and tends to favor certain engineering stages, stakeholders, and evaluation preferences. We envision a capability-based framework, which uses fine-grained specifications for ML model behaviors to unite existing efforts towards better ML engineering. We use concrete scenarios (model design, debugging, and maintenance) to articulate capabilities' broad applications across various different dimensions, and their impact on building safer, more generalizable and more trustworthy models that reflect human needs. Through preliminary experiments, we show capabilities' potential for reflecting model generalizability, which can provide guidance for ML engineering process. We discuss challenges and opportunities for capabilities' integration into ML engineering.
△ Less
Submitted 10 February, 2023; v1 submitted 11 November, 2022;
originally announced November 2022.
-
Data Leakage in Notebooks: Static Detection and Better Processes
Authors:
Chenyang Yang,
Rachel A Brower-Sinning,
Grace A. Lewis,
Christian Kästner
Abstract:
Data science pipelines to train and evaluate models with machine learning may contain bugs just like any other code. Leakage between training and test data can lead to overestimating the model's accuracy during offline evaluations, possibly leading to deployment of low-quality models in production. Such leakage can happen easily by mistake or by following poor practices, but may be tedious and cha…
▽ More
Data science pipelines to train and evaluate models with machine learning may contain bugs just like any other code. Leakage between training and test data can lead to overestimating the model's accuracy during offline evaluations, possibly leading to deployment of low-quality models in production. Such leakage can happen easily by mistake or by following poor practices, but may be tedious and challenging to detect manually. We develop a static analysis approach to detect common forms of data leakage in data science code. Our evaluation shows that our analysis accurately detects data leakage and that such leakage is pervasive among over 100,000 analyzed public notebooks. We discuss how our static analysis approach can help both practitioners and educators, and how leakage prevention can be designed into the development process.
△ Less
Submitted 7 September, 2022;
originally announced September 2022.
-
Aspirations and Practice of Model Documentation: Moving the Needle with Nudging and Traceability
Authors:
Avinash Bhat,
Austin Coursey,
Grace Hu,
Sixian Li,
Nadia Nahar,
Shurui Zhou,
Christian Kästner,
Jin L. C. Guo
Abstract:
The documentation practice for machine-learned (ML) models often falls short of established practices for traditional software, which impedes model accountability and inadvertently abets inappropriate or misuse of models. Recently, model cards, a proposal for model documentation, have attracted notable attention, but their impact on the actual practice is unclear. In this work, we systematically s…
▽ More
The documentation practice for machine-learned (ML) models often falls short of established practices for traditional software, which impedes model accountability and inadvertently abets inappropriate or misuse of models. Recently, model cards, a proposal for model documentation, have attracted notable attention, but their impact on the actual practice is unclear. In this work, we systematically study the model documentation in the field and investigate how to encourage more responsible and accountable documentation practice. Our analysis of publicly available model cards reveals a substantial gap between the proposal and the practice. We then design a tool named DocML aiming to (1) nudge the data scientists to comply with the model cards proposal during the model development, especially the sections related to ethics, and (2) assess and manage the documentation quality. A lab study reveals the benefit of our tool towards long-term documentation quality and accountability.
△ Less
Submitted 8 February, 2023; v1 submitted 13 April, 2022;
originally announced April 2022.
-
On Debugging the Performance of Configurable Software Systems: Developer Needs and Tailored Tool Support
Authors:
Miguel Velez,
Pooyan Jamshidi,
Norbert Siegmund,
Sven Apel,
Christian Kästner
Abstract:
Determining whether a configurable software system has a performance bug or it was misconfigured is often challenging. While there are numerous debugging techniques that can support developers in this task, there is limited empirical evidence of how useful the techniques are to address the actual needs that developers have when debugging the performance of configurable software systems; most techn…
▽ More
Determining whether a configurable software system has a performance bug or it was misconfigured is often challenging. While there are numerous debugging techniques that can support developers in this task, there is limited empirical evidence of how useful the techniques are to address the actual needs that developers have when debugging the performance of configurable software systems; most techniques are often evaluated in terms of technical accuracy instead of their usability. In this paper, we take a human-centered approach to identify, design, implement, and evaluate a solution to support developers in the process of debugging the performance of configurable software systems. We first conduct an exploratory study with 19 developers to identify the information needs that developers have during this process. Subsequently, we design and implement a tailored tool, adapting techniques from prior work, to support those needs. Two user studies, with a total of 20 developers, validate and confirm that the information that we provide helps developers debug the performance of configurable software systems.
△ Less
Submitted 19 March, 2022;
originally announced March 2022.
-
Collaboration Challenges in Building ML-Enabled Systems: Communication, Documentation, Engineering, and Process
Authors:
Nadia Nahar,
Shurui Zhou,
Grace Lewis,
Christian Kästner
Abstract:
The introduction of machine learning (ML) components in software projects has created the need for software engineers to collaborate with data scientists and other specialists. While collaboration can always be challenging, ML introduces additional challenges with its exploratory model development process, additional skills and knowledge needed, difficulties testing ML systems, need for continuous…
▽ More
The introduction of machine learning (ML) components in software projects has created the need for software engineers to collaborate with data scientists and other specialists. While collaboration can always be challenging, ML introduces additional challenges with its exploratory model development process, additional skills and knowledge needed, difficulties testing ML systems, need for continuous evolution and monitoring, and non-traditional quality requirements such as fairness and explainability. Through interviews with 45 practitioners from 28 organizations, we identified key collaboration challenges that teams face when building and deploying ML systems into production. We report on common collaboration points in the development of production ML systems for requirements, data, and integration, as well as corresponding team patterns and challenges. We find that most of these challenges center around communication, documentation, engineering, and process and collect recommendations to address these challenges.
△ Less
Submitted 10 February, 2022; v1 submitted 19 October, 2021;
originally announced October 2021.
-
Feature Interactions on Steroids: On the Composition of ML Models
Authors:
Christian Kästner,
Eunsuk Kang,
Sven Apel
Abstract:
The lack of specifications is a key difference between traditional software engineering and machine learning. We discuss how it drastically impacts how we think about divide-and-conquer approaches to system design, and how it impacts reuse, testing and debugging activities. Traditionally, specifications provide a cornerstone for compositional reasoning and for the divide-and-conquer strategy of ho…
▽ More
The lack of specifications is a key difference between traditional software engineering and machine learning. We discuss how it drastically impacts how we think about divide-and-conquer approaches to system design, and how it impacts reuse, testing and debugging activities. Traditionally, specifications provide a cornerstone for compositional reasoning and for the divide-and-conquer strategy of how we build large and complex systems from components, but those are hard to come by for machine-learned components. While the lack of specification seems like a fundamental new problem at first sight, in fact software engineers routinely deal with iffy specifications in practice: we face weak specifications, wrong specifications, and unanticipated interactions among components and their specifications. Machine learning may push us further, but the problems are not fundamentally new. Rethinking machine-learning model composition from the perspective of the feature interaction problem, we may even teach us a thing or two on how to move forward, including the importance of integration testing, of requirements engineering, and of design.
△ Less
Submitted 13 May, 2021;
originally announced May 2021.
-
Containing Malicious Package Updates in npm with a Lightweight Permission System
Authors:
Gabriel Ferreira,
Limin Jia,
Joshua Sunshine,
Christian Kästner
Abstract:
The large amount of third-party packages available in fast-moving software ecosystems, such as Node.js/npm, enables attackers to compromise applications by pushing malicious updates to their package dependencies. Studying the npm repository, we observed that many packages in the npm repository that are used in Node.js applications perform only simple computations and do not need access to filesyst…
▽ More
The large amount of third-party packages available in fast-moving software ecosystems, such as Node.js/npm, enables attackers to compromise applications by pushing malicious updates to their package dependencies. Studying the npm repository, we observed that many packages in the npm repository that are used in Node.js applications perform only simple computations and do not need access to filesystem or network APIs. This offers the opportunity to enforce least-privilege design per package, protecting applications and package dependencies from malicious updates. We propose a lightweight permission system that protects Node.js applications by enforcing package permissions at runtime. We discuss the design space of solutions and show that our system makes a large number of packages much harder to be exploited, almost for free.
△ Less
Submitted 7 March, 2021;
originally announced March 2021.
-
White-Box Analysis over Machine Learning: Modeling Performance of Configurable Systems
Authors:
Miguel Velez,
Pooyan Jamshidi,
Norbert Siegmund,
Sven Apel,
Christian Kästner
Abstract:
Performance-influence models can help stakeholders understand how and where configuration options and their interactions influence the performance of a system. With this understanding, stakeholders can debug performance behavior and make deliberate configuration decisions. Current black-box techniques to build such models combine various sampling and learning strategies, resulting in tradeoffs bet…
▽ More
Performance-influence models can help stakeholders understand how and where configuration options and their interactions influence the performance of a system. With this understanding, stakeholders can debug performance behavior and make deliberate configuration decisions. Current black-box techniques to build such models combine various sampling and learning strategies, resulting in tradeoffs between measurement effort, accuracy, and interpretability. We present Comprex, a white-box approach to build performance-influence models for configurable systems, combining insights of local measurements, dynamic taint analysis to track options in the implementation, compositionality, and compression of the configuration space, without relying on machine learning to extrapolate incomplete samples. Our evaluation on 4 widely-used, open-source projects demonstrates that Comprex builds similarly accurate performance-influence models to the most accurate and expensive black-box approach, but at a reduced cost and with additional benefits from interpretable and local models.
△ Less
Submitted 13 January, 2021;
originally announced January 2021.
-
Understanding the Nature of System-Related Issues in Machine Learning Frameworks: An Exploratory Study
Authors:
Yang Ren,
Gregory Gay,
Christian Kästner,
Pooyan Jamshidi
Abstract:
Modern systems are built using development frameworks. These frameworks have a major impact on how the resulting system executes, how configurations are managed, how it is tested, and how and where it is deployed. Machine learning (ML) frameworks and the systems developed using them differ greatly from traditional frameworks. Naturally, the issues that manifest in such frameworks may differ as wel…
▽ More
Modern systems are built using development frameworks. These frameworks have a major impact on how the resulting system executes, how configurations are managed, how it is tested, and how and where it is deployed. Machine learning (ML) frameworks and the systems developed using them differ greatly from traditional frameworks. Naturally, the issues that manifest in such frameworks may differ as well---as may the behavior of developers addressing those issues. We are interested in characterizing the system-related issues---issues impacting performance, memory and resource usage, and other quality attributes---that emerge in ML frameworks, and how they differ from those in traditional frameworks. We have conducted a moderate-scale exploratory study analyzing real-world system-related issues from 10 popular machine learning frameworks. Our findings offer implications for the development of machine learning systems, including differences in the frequency of occurrence of certain issue types, observations regarding the impact of debate and time on issue correction, and differences in the specialization of developers. We hope that this exploratory study will enable developers to improve their expectations, plan for risk, and allocate resources accordingly when making use of the tools provided by these frameworks to develop ML-based systems.
△ Less
Submitted 12 May, 2020;
originally announced May 2020.
-
Efficiently Finding Higher-Order Mutants
Authors:
Chu-Pan Wong,
Jens Meinicke,
Leo Chen,
João P. Diniz,
Christian Kästner,
Eduardo Figueiredo
Abstract:
Higher-order mutation has the potential for improving major drawbacks of traditional first-order mutation, such as by simulating more realistic faults or improving test optimization techniques. Despite interest in studying promising higher-order mutants, such mutants are difficult to find due to the exponential search space of mutation combinations. State-of-the-art approaches rely on genetic sear…
▽ More
Higher-order mutation has the potential for improving major drawbacks of traditional first-order mutation, such as by simulating more realistic faults or improving test optimization techniques. Despite interest in studying promising higher-order mutants, such mutants are difficult to find due to the exponential search space of mutation combinations. State-of-the-art approaches rely on genetic search, which is often incomplete and expensive due to its stochastic nature. First, we propose a novel way of finding a complete set of higher-order mutants by using variational execution, a technique that can, in many cases, explore large search spaces completely and often efficiently. Second, we use the identified complete set of higher-order mutants to study their characteristics. Finally, we use the identified characteristics to design and evaluate a new search strategy, independent of variational execution, that is highly effective at finding higher-order mutants even in large code bases.
△ Less
Submitted 4 April, 2020;
originally announced April 2020.
-
Teaching Software Engineering for AI-Enabled Systems
Authors:
Christian Kästner,
Eunsuk Kang
Abstract:
Software engineers have significant expertise to offer when building intelligent systems, drawing on decades of experience and methods for building systems that are scalable, responsive and robust, even when built on unreliable components. Systems with artificial-intelligence or machine-learning (ML) components raise new challenges and require careful engineering. We designed a new course to teach…
▽ More
Software engineers have significant expertise to offer when building intelligent systems, drawing on decades of experience and methods for building systems that are scalable, responsive and robust, even when built on unreliable components. Systems with artificial-intelligence or machine-learning (ML) components raise new challenges and require careful engineering. We designed a new course to teach software-engineering skills to students with a background in ML. We specifically go beyond traditional ML courses that teach modeling techniques under artificial conditions and focus, in lecture and assignments, on realism with large and changing datasets, robust and evolvable infrastructure, and purposeful requirements engineering that considers ethics and fairness as well. We describe the course and our infrastructure and share experience and all material from teaching the course for the first time.
△ Less
Submitted 18 January, 2020;
originally announced January 2020.
-
Learning-based Funnel-MPC for output-constrained nonlinear systems
Authors:
Thomas Berger,
Carolin Kästner,
Karl Worthmann
Abstract:
We exploit an adaptive control technique, namely funnel control, in order to establish both initial and recursive feasibility in Model Predictive Control (MPC) for output-constrained nonlinear systems. Moreover, we show that the resulting feedback controller outperforms the funnel controller both w.r.t. the required sampling rate for a zero-order-hold implementation and required control action. We…
▽ More
We exploit an adaptive control technique, namely funnel control, in order to establish both initial and recursive feasibility in Model Predictive Control (MPC) for output-constrained nonlinear systems. Moreover, we show that the resulting feedback controller outperforms the funnel controller both w.r.t. the required sampling rate for a zero-order-hold implementation and required control action. We further propose a combination of funnel control and MPC, exploiting the performance guarantees of the model-free funnel controller during a learning phase and the advantages of the model-based MPC scheme thereafter.
△ Less
Submitted 4 December, 2019;
originally announced December 2019.
-
How Do Code Changes Evolve in Different Platforms? A Mining-based Investigation
Authors:
Markos Viggiato,
Johnatan Oliveira,
Eduardo Figueiredo,
Pooyan Jamshidi,
Christian Kästner
Abstract:
Code changes are performed differently in the mobile and non-mobile platforms. Prior work has investigated the differences in specific platforms. However, we still lack a deeper understanding of how code changes evolve across different software platforms. In this paper, we present a study aiming at investigating the frequency of changes and how source code, build and test changes co-evolve in mobi…
▽ More
Code changes are performed differently in the mobile and non-mobile platforms. Prior work has investigated the differences in specific platforms. However, we still lack a deeper understanding of how code changes evolve across different software platforms. In this paper, we present a study aiming at investigating the frequency of changes and how source code, build and test changes co-evolve in mobile and non-mobile platforms. We developed regression models to explain which factors influence the frequency of changes and applied the Apriori algorithm to find types of changes that frequently co-occur. Our findings show that non-mobile repositories have a higher number of commits per month and our regression models suggest that being mobile significantly impacts on the number of commits in a negative direction when controlling for confound factors, such as code size. We also found that developers do not usually change source code files together with build or test files. We argue that our results can provide valuable information for developers on how changes are performed in different platforms so that practices adopted in successful software systems can be followed.
△ Less
Submitted 24 October, 2019;
originally announced October 2019.
-
Design Dimensions for Software Certification: A Grounded Analysis
Authors:
Gabriel Ferreira,
Christian Kästner,
Joshua Sunshine,
Sven Apel,
William Scherlis
Abstract:
In many domains, software systems cannot be deployed until authorities judge them fit for use in an intended operating environment. Certification standards and processes have been devised and deployed to regulate operations of software systems and prevent their failures. However, practitioners are often unsatisfied with the efficiency and value proposition of certification efforts. In this study,…
▽ More
In many domains, software systems cannot be deployed until authorities judge them fit for use in an intended operating environment. Certification standards and processes have been devised and deployed to regulate operations of software systems and prevent their failures. However, practitioners are often unsatisfied with the efficiency and value proposition of certification efforts. In this study, we compare two certification standards, Common Criteria and DO-178C, and collect insights from literature and from interviews with subject-matter experts to identify design options relevant to the design of standards. The results of the comparison of certification efforts---leading to the identification of design dimensions that affect their quality---serve as a framework to guide the comparison, creation, and revision of certification standards and processes. This paper puts software engineering research in context and discusses key issues around process and quality assurance and includes observations from industry about relevant topics such as recertification, timely evaluations, but also technical discussions around model-driven approaches and formal methods. Our initial characterization of the design space of certification efforts can be used to inform technical discussions and to influence the directions of new or existing certification efforts. Practitioners, technical commissions, and government can directly benefit from our analytical framework.
△ Less
Submitted 23 May, 2019;
originally announced May 2019.
-
ConfigCrusher: Towards White-Box Performance Analysis for Configurable Systems
Authors:
Miguel Velez,
Pooyan Jamshidi,
Florian Sattler,
Norbert Siegmund,
Sven Apel,
Christian Kastner
Abstract:
Stakeholders of configurable systems are often interested in knowing how configuration options influence the performance of a system to facilitate, for example, the debugging and optimization processes of these systems. Several black-box approaches can be used to obtain this information, but they either sample a large number of configurations to make accurate predictions or miss important performa…
▽ More
Stakeholders of configurable systems are often interested in knowing how configuration options influence the performance of a system to facilitate, for example, the debugging and optimization processes of these systems. Several black-box approaches can be used to obtain this information, but they either sample a large number of configurations to make accurate predictions or miss important performance-influencing interactions when sampling few configurations. Furthermore, black-box approaches cannot pinpoint the parts of a system that are responsible for performance differences among configurations. This article proposes ConfigCrusher, a white-box performance analysis that inspects the implementation of a system to guide the performance analysis, exploiting several insights of configurable systems in the process. ConfigCrusher employs a static data-flow analysis to identify how configuration options may influence control-flow statements and instruments code regions, corresponding to these statements, to dynamically analyze the influence of configuration options on the regions' performance. Our evaluation on 10 configurable systems shows the feasibility of our white-box approach to more efficiently build performance-influence models that are similar to or more accurate than current state of the art approaches. Overall, we showcase the benefits of white-box performance analyses and their potential to outperform black-box approaches and provide additional information for analyzing configurable systems.
△ Less
Submitted 14 July, 2020; v1 submitted 6 May, 2019;
originally announced May 2019.
-
Machine Learning Meets Quantitative Planning: Enabling Self-Adaptation in Autonomous Robots
Authors:
Pooyan Jamshidi,
Javier Cámara,
Bradley Schmerl,
Christian Kästner,
David Garlan
Abstract:
Modern cyber-physical systems (e.g., robotics systems) are typically composed of physical and software components, the characteristics of which are likely to change over time. Assumptions about parts of the system made at design time may not hold at run time, especially when a system is deployed for long periods (e.g., over decades). Self-adaptation is designed to find reconfigurations of systems…
▽ More
Modern cyber-physical systems (e.g., robotics systems) are typically composed of physical and software components, the characteristics of which are likely to change over time. Assumptions about parts of the system made at design time may not hold at run time, especially when a system is deployed for long periods (e.g., over decades). Self-adaptation is designed to find reconfigurations of systems to handle such run-time inconsistencies. Planners can be used to find and enact optimal reconfigurations in such an evolving context. However, for systems that are highly configurable, such planning becomes intractable due to the size of the adaptation space. To overcome this challenge, in this paper we explore an approach that (a) uses machine learning to find Pareto-optimal configurations without needing to explore every configuration and (b) restricts the search space to such configurations to make planning tractable. We explore this in the context of robot missions that need to consider task timeliness and energy consumption. An independent evaluation shows that our approach results in high-quality adaptation plans in uncertain and adversarial environments.
△ Less
Submitted 9 March, 2019;
originally announced March 2019.
-
Faster Variational Execution with Transparent Bytecode Transformation
Authors:
Chu-Pan Wong,
Jens Meinicke,
Lukas Lazarek,
Christian Kästner
Abstract:
Variational execution is a novel dynamic analysis technique for exploring highly configurable systems and accurately tracking information flow. It is able to efficiently analyze many configurations by aggressively sharing redundancies of program executions. The idea of variational execution has been demonstrated to be effective in exploring variations in the program, especially when the configurat…
▽ More
Variational execution is a novel dynamic analysis technique for exploring highly configurable systems and accurately tracking information flow. It is able to efficiently analyze many configurations by aggressively sharing redundancies of program executions. The idea of variational execution has been demonstrated to be effective in exploring variations in the program, especially when the configuration space grows out of control. Existing implementations of variational execution often require heavy lifting of the runtime interpreter, which is painstaking and error-prone. Furthermore, the performance of this approach is suboptimal. For example, the state-of-the-art variational execution interpreter for Java, VarexJ, slows down executions by 100 to 800 times over a single execution for small to medium size Java programs. Instead of modifying existing JVMs, we propose to transform existing bytecode to make it variational, so it can be executed on an unmodified commodity JVM. Our evaluation shows a dramatic improvement on performance over the state-of-the-art, with a speedup of 2 to 46 times, and high efficiency in sharing computations.
△ Less
Submitted 11 September, 2018;
originally announced September 2018.
-
Understanding Differences among Executions with Variational Traces
Authors:
Jens Meinicke,
Chu-Pan Wong,
Christian Kästner,
Gunter Saake
Abstract:
One of the main challenges of debugging is to understand why the program fails for certain inputs but succeeds for others. This becomes especially difficult if the fault is caused by an interaction of multiple inputs. To debug such interaction faults, it is necessary to understand the individual effect of the input, how these inputs interact and how these interactions cause the fault. The differen…
▽ More
One of the main challenges of debugging is to understand why the program fails for certain inputs but succeeds for others. This becomes especially difficult if the fault is caused by an interaction of multiple inputs. To debug such interaction faults, it is necessary to understand the individual effect of the input, how these inputs interact and how these interactions cause the fault. The differences between two execution traces can explain why one input behaves differently than the other. We propose to compare execution traces of all input options to derive explanations of the behavior of all options and interactions among them. To make the relevant information stand out, we represent them as variational traces that concisely represents control-flow and data-flow differences among multiple concrete traces. While variational traces can be obtained from brute-force execution of all relevant inputs, we use variational execution to scale the generation of variational traces to the exponential space of possible inputs. We further provide an Eclipse plugin Varviz that enables users to use variational traces for debugging and navigation. In a user study, we show that users of variational traces are more than twice as fast to finish debugging tasks than users of the standard Eclipse debugger. We further show that variational traces can be scaled to programs with many options.
△ Less
Submitted 10 July, 2018;
originally announced July 2018.
-
On the Relation of External and Internal Feature Interactions: A Case Study
Authors:
Sergiy Kolesnikov,
Norbert Siegmund,
Christian Kästner,
Sven Apel
Abstract:
Detecting feature interactions is imperative for accurately predicting performance of highly-configurable systems. State-of-the-art performance prediction techniques rely on supervised machine learning for detecting feature interactions, which, in turn, relies on time consuming performance measurements to obtain training data. By providing information about potentially interacting features, we can…
▽ More
Detecting feature interactions is imperative for accurately predicting performance of highly-configurable systems. State-of-the-art performance prediction techniques rely on supervised machine learning for detecting feature interactions, which, in turn, relies on time consuming performance measurements to obtain training data. By providing information about potentially interacting features, we can reduce the number of required performance measurements and make the overall performance prediction process more time efficient. We expect that the information about potentially interacting features can be obtained by statically analyzing the source code of a highly-configurable system, which is computationally cheaper than performing multiple performance measurements. To this end, we conducted a qualitative case study in which we explored the relation between control-flow feature interactions (detected through static program analysis) and performance feature interactions (detected by performance prediction techniques using performance measurements). We found that a relation exists, which can potentially be exploited to predict performance interactions.
△ Less
Submitted 22 January, 2018; v1 submitted 20 December, 2017;
originally announced December 2017.
-
Transfer Learning for Performance Modeling of Configurable Systems: An Exploratory Analysis
Authors:
Pooyan Jamshidi,
Norbert Siegmund,
Miguel Velez,
Christian Kästner,
Akshay Patel,
Yuvraj Agarwal
Abstract:
Modern software systems provide many configuration options which significantly influence their non-functional properties. To understand and predict the effect of configuration options, several sampling and learning strategies have been proposed, albeit often with significant cost to cover the highly dimensional configuration space. Recently, transfer learning has been applied to reduce the effort…
▽ More
Modern software systems provide many configuration options which significantly influence their non-functional properties. To understand and predict the effect of configuration options, several sampling and learning strategies have been proposed, albeit often with significant cost to cover the highly dimensional configuration space. Recently, transfer learning has been applied to reduce the effort of constructing performance models by transferring knowledge about performance behavior across environments. While this line of research is promising to learn more accurate models at a lower cost, it is unclear why and when transfer learning works for performance modeling. To shed light on when it is beneficial to apply transfer learning, we conducted an empirical study on four popular software systems, varying software configurations and environmental conditions, such as hardware, workload, and software versions, to identify the key knowledge pieces that can be exploited for transfer learning. Our results show that in small environmental changes (e.g., homogeneous workload change), by applying a linear transformation to the performance model, we can understand the performance behavior of the target environment, while for severe environmental changes (e.g., drastic workload change) we can transfer only knowledge that makes sampling more efficient, e.g., by reducing the dimensionality of the configuration space.
△ Less
Submitted 7 September, 2017;
originally announced September 2017.
-
Differential Testing for Variational Analyses: Experience from Developing KConfigReader
Authors:
Christian Kästner
Abstract:
Differential testing to solve the oracle problem has been applied in many scenarios where multiple supposedly equivalent implementations exist, such as multiple implementations of a C compiler. If the multiple systems disagree on the output for a given test input, we have likely discovered a bug without every having to specify what the expected output is. Research on variational analyses (or varia…
▽ More
Differential testing to solve the oracle problem has been applied in many scenarios where multiple supposedly equivalent implementations exist, such as multiple implementations of a C compiler. If the multiple systems disagree on the output for a given test input, we have likely discovered a bug without every having to specify what the expected output is. Research on variational analyses (or variability-aware or family-based analyses) can benefit from similar ideas. The goal of most variational analyses is to perform an analysis, such as type checking or model checking, over a large number of configurations much faster than an existing traditional analysis could by analyzing each configuration separately. Variational analyses are very suitable for differential testing, since the existence nonvariational analysis can provide the oracle for test cases that would otherwise be tedious or difficult to write. In this experience paper, I report how differential testing has helped in developing KConfigReader, a tool for translating the Linux kernel's kconfig model into a propositional formula. Differential testing allows us to quickly build a large test base and incorporate external tests that avoided many regressions during development and made KConfigReader likely the most precise kconfig extraction tool available.
△ Less
Submitted 28 June, 2017;
originally announced June 2017.
-
Transfer Learning for Improving Model Predictions in Highly Configurable Software
Authors:
Pooyan Jamshidi,
Miguel Velez,
Christian Kästner,
Norbert Siegmund,
Prasad Kawthekar
Abstract:
Modern software systems are built to be used in dynamic environments using configuration capabilities to adapt to changes and external uncertainties. In a self-adaptation context, we are often interested in reasoning about the performance of the systems under different configurations. Usually, we learn a black-box model based on real measurements to predict the performance of the system given a sp…
▽ More
Modern software systems are built to be used in dynamic environments using configuration capabilities to adapt to changes and external uncertainties. In a self-adaptation context, we are often interested in reasoning about the performance of the systems under different configurations. Usually, we learn a black-box model based on real measurements to predict the performance of the system given a specific configuration. However, as modern systems become more complex, there are many configuration parameters that may interact and we end up learning an exponentially large configuration space. Naturally, this does not scale when relying on real measurements in the actual changing environment. We propose a different solution: Instead of taking the measurements from the real system, we learn the model using samples from other sources, such as simulators that approximate performance of the real system at low cost. We define a cost model that transform the traditional view of model learning into a multi-objective problem that not only takes into account model accuracy but also measurements effort as well. We evaluate our cost-aware transfer learning solution using real-world configurable software including (i) a robotic system, (ii) 3 different stream processing applications, and (iii) a NoSQL database system. The experimental results demonstrate that our approach can achieve (a) a high prediction accuracy, as well as (b) a high model reliability.
△ Less
Submitted 20 April, 2017; v1 submitted 1 April, 2017;
originally announced April 2017.
-
Nanoparticle Size Distribution Quantification: Results of a SAXS Inter-Laboratory Comparison
Authors:
Brian R. Pauw,
Claudia Kästner,
Andreas F. Thünemann
Abstract:
We present the first world-wide inter-laboratory comparison of small-angle X-ray scattering (SAXS) for nanoparticle sizing. The measurands in this comparison are the mean particle radius, the width of the size distribution and the particle concentration. The investigated sample consists of dispersed silver nanoparticles, surrounded by a stabilizing polymeric shell of poly(acrylic acid). The silver…
▽ More
We present the first world-wide inter-laboratory comparison of small-angle X-ray scattering (SAXS) for nanoparticle sizing. The measurands in this comparison are the mean particle radius, the width of the size distribution and the particle concentration. The investigated sample consists of dispersed silver nanoparticles, surrounded by a stabilizing polymeric shell of poly(acrylic acid). The silver cores dominate the X-ray scattering pattern, leading to the determination of their radii size distribution using: i) Glatter's Indirect Fourier Transformation method, ii) classical model fitting using SASfit and iii) a Monte Carlo fitting approach using McSAS. The application of these three methods to the collected datasets produces consistent mean number- and volume-weighted core radii of R$_n$ = 2.76 nm and R$_v$ = 3.20 nm, respectively. The corresponding widths of the log-normal radii distribution of the particles were $σ_n$ = 0.65 nm and $σ_v$ = 0.71 nm. The particle concentration determined using this method was 3.00 $\pm$ 0.38 g/L (4.20 $\pm$ 0.73 $\times$ 10$^{-6}$ mol/L). We show that the results are slightly biased by the choice of data evaluation procedure, but that no substantial differences were found between the results from data measured on a very wide range of instruments: the participating laboratories at synchrotron SAXS beamlines, commercial and home-made instruments were all able to provide data of high quality. Our results demonstrate that SAXS is a qualified method for revealing particle size distributions in the sub-20 nm region (at least), out of reach for most other analytical methods.
△ Less
Submitted 13 February, 2017;
originally announced February 2017.
-
Do #ifdefs Influence the Occurrence of Vulnerabilities? An Empirical Study of the Linux Kernel
Authors:
Gabriel Ferreira,
Momin Malik,
Christian Kästner,
Jürgen Pfeffer,
Sven Apel
Abstract:
Preprocessors support the diversification of software products with #ifdefs, but also require additional effort from developers to maintain and understand variable code. We conjecture that #ifdefs cause developers to produce more vulnerable code because they are required to reason about multiple features simultaneously and maintain complex mental models of dependencies of configurable code.
We e…
▽ More
Preprocessors support the diversification of software products with #ifdefs, but also require additional effort from developers to maintain and understand variable code. We conjecture that #ifdefs cause developers to produce more vulnerable code because they are required to reason about multiple features simultaneously and maintain complex mental models of dependencies of configurable code.
We extracted a variational call graph across all configurations of the Linux kernel, and used configuration complexity metrics to compare vulnerable and non-vulnerable functions considering their vulnerability history. Our goal was to learn about whether we can observe a measurable influence of configuration complexity on the occurrence of vulnerabilities.
Our results suggest, among others, that vulnerable functions have higher variability than non-vulnerable ones and are also constrained by fewer configuration options. This suggests that developers are inclined to notice functions appear in frequently-compiled product variants. We aim to raise developers' awareness to address variability more systematically, since configuration complexity is an important, but often ignored aspect of software product lines.
△ Less
Submitted 23 May, 2016;
originally announced May 2016.
-
A Comparison of 10 Sampling Algorithms for Configurable Systems
Authors:
Flávio Medeiros,
Christian Kästner,
Márcio Ribeiro,
Rohit Gheyi,
Sven Apel
Abstract:
Almost every software system provides configuration options to tailor the system to the target platform and application scenario. Often, this configurability renders the analysis of every individual system configuration infeasible. To address this problem, researchers have proposed a diverse set of sampling algorithms. We present a comparative study of 10 state-of-the-art sampling algorithms regar…
▽ More
Almost every software system provides configuration options to tailor the system to the target platform and application scenario. Often, this configurability renders the analysis of every individual system configuration infeasible. To address this problem, researchers have proposed a diverse set of sampling algorithms. We present a comparative study of 10 state-of-the-art sampling algorithms regarding their fault-detection capability and size of sample sets. The former is important to improve software quality and the latter to reduce the time of analysis. In a nutshell, we found that sampling algorithms with larger sample sets are able to detect higher numbers of faults, but simple algorithms with small sample sets, such as most-enabled-disabled, are the most efficient in most contexts. Furthermore, we observed that the limiting assumptions made in previous work influence the number of detected faults, the size of sample sets, and the ranking of algorithms. Finally, we have identified a number of technical challenges when trying to avoid the limiting assumptions, which questions the practicality of certain sampling algorithms.
△ Less
Submitted 16 February, 2016; v1 submitted 5 February, 2016;
originally announced February 2016.
-
Reify Your Collection Queries for Modularity and Speed!
Authors:
Paolo G. Giarrusso,
Klaus Ostermann,
Michael Eichberg,
Ralf Mitschke,
Tillmann Rendel,
Christian Kästner
Abstract:
Modularity and efficiency are often contradicting requirements, such that programers have to trade one for the other. We analyze this dilemma in the context of programs operating on collections. Performance-critical code using collections need often to be hand-optimized, leading to non-modular, brittle, and redundant code. In principle, this dilemma could be avoided by automatic collection-specifi…
▽ More
Modularity and efficiency are often contradicting requirements, such that programers have to trade one for the other. We analyze this dilemma in the context of programs operating on collections. Performance-critical code using collections need often to be hand-optimized, leading to non-modular, brittle, and redundant code. In principle, this dilemma could be avoided by automatic collection-specific optimizations, such as fusion of collection traversals, usage of indexing, or reordering of filters. Unfortunately, it is not obvious how to encode such optimizations in terms of ordinary collection APIs, because the program operating on the collections is not reified and hence cannot be analyzed.
We propose SQuOpt, the Scala Query Optimizer--a deep embedding of the Scala collections API that allows such analyses and optimizations to be defined and executed within Scala, without relying on external tools or compiler extensions. SQuOpt provides the same "look and feel" (syntax and static typing guarantees) as the standard collections API. We evaluate SQuOpt by re-implementing several code analyses of the Findbugs tool using SQuOpt, show average speedups of 12x with a maximum of 12800x and hence demonstrate that SQuOpt can reconcile modularity and efficiency in real-world applications.
△ Less
Submitted 23 October, 2012;
originally announced October 2012.
-
Towards an efficient prover for the C1 paraconsistent logic
Authors:
Adolfo Neto,
Celso A. A. Kaestner,
Marcelo Finger
Abstract:
The KE inference system is a tableau method developed by Marco Mondadori which was presented as an improvement, in the computational efficiency sense, over Analytic Tableaux. In the literature, there is no description of a theorem prover based on the KE method for the C1 paraconsistent logic. Paraconsistent logics have several applications, such as in robot control and medicine. These applications…
▽ More
The KE inference system is a tableau method developed by Marco Mondadori which was presented as an improvement, in the computational efficiency sense, over Analytic Tableaux. In the literature, there is no description of a theorem prover based on the KE method for the C1 paraconsistent logic. Paraconsistent logics have several applications, such as in robot control and medicine. These applications could benefit from the existence of such a prover. We present a sound and complete KE system for C1, an informal specification of a strategy for the C1 prover as well as problem families that can be used to evaluate provers for C1. The C1 KE system and the strategy described in this paper will be used to implement a KE based prover for C1, which will be useful for those who study and apply paraconsistent logics.
△ Less
Submitted 19 February, 2012;
originally announced February 2012.
-
Type-Safe Feature-Oriented Product Lines
Authors:
Sven Apel,
Christian Kaestner,
Armin Groesslinger,
Christian Lengauer
Abstract:
A feature-oriented product line is a family of programs that share a common set of features. A feature implements a stakeholder's requirement, represents a design decision and configuration option and, when added to a program, involves the introduction of new structures, such as classes and methods, and the refinement of existing ones, such as extending methods. With feature-oriented decompositi…
▽ More
A feature-oriented product line is a family of programs that share a common set of features. A feature implements a stakeholder's requirement, represents a design decision and configuration option and, when added to a program, involves the introduction of new structures, such as classes and methods, and the refinement of existing ones, such as extending methods. With feature-oriented decomposition, programs can be generated, solely on the basis of a user's selection of features, by the composition of the corresponding feature code. A key challenge of feature-oriented product line engineering is how to guarantee the correctness of an entire feature-oriented product line, i.e., of all of the member programs generated from different combinations of features. As the number of valid feature combinations grows progressively with the number of features, it is not feasible to check all individual programs. The only feasible approach is to have a type system check the entire code base of the feature-oriented product line. We have developed such a type system on the basis of a formal model of a feature-oriented Java-like language. We demonstrate that the type system ensures that every valid program of a feature-oriented product line is well-typed and that the type system is complete.
△ Less
Submitted 20 January, 2010;
originally announced January 2010.