-
Employing Software Diversity in Cloud Microservices to Engineer Reliable and Performant Systems
Authors:
Nazanin Akhtarian,
Hamzeh Khazaei,
Marin Litoiu
Abstract:
In the ever-shifting landscape of software engineering, we recognize the need for adaptation and evolution to maintain system dependability. As each software iteration potentially introduces new challenges, from unforeseen bugs to performance anomalies, it becomes paramount to understand and address these intricacies to ensure robust system operations during the lifetime. This work proposes employ…
▽ More
In the ever-shifting landscape of software engineering, we recognize the need for adaptation and evolution to maintain system dependability. As each software iteration potentially introduces new challenges, from unforeseen bugs to performance anomalies, it becomes paramount to understand and address these intricacies to ensure robust system operations during the lifetime. This work proposes employing software diversity to enhance system reliability and performance simultaneously. A cornerstone of our work is the derivation of a reliability metric. This metric encapsulates the reliability and performance of each software version under adverse conditions. Using the calculated reliability score, we implemented a dynamic controller responsible for adjusting the population of each software version. The goal is to maintain a higher replica count for more reliable versions while preserving the diversity of versions as much as possible. This balance is crucial for ensuring not only the reliability but also the performance of the system against a spectrum of potential failures. In addition, we designed and implemented a diversity-aware autoscaling algorithm that maintains the reliability and performance of the system at the same time and at any scale. Our extensive experiments on realistic cloud microservice-based applications show the effectiveness of the proposed approach in this paper in promoting both reliability and performance.
△ Less
Submitted 9 July, 2024;
originally announced July 2024.
-
A Learning-Based Caching Mechanism for Edge Content Delivery
Authors:
Hoda Torabi,
Hamzeh Khazaei,
Marin Litoiu
Abstract:
With the advent of 5G networks and the rise of the Internet of Things (IoT), Content Delivery Networks (CDNs) are increasingly extending into the network edge. This shift introduces unique challenges, particularly due to the limited cache storage and the diverse request patterns at the edge. These edge environments can host traffic classes characterized by varied object-size distributions and obje…
▽ More
With the advent of 5G networks and the rise of the Internet of Things (IoT), Content Delivery Networks (CDNs) are increasingly extending into the network edge. This shift introduces unique challenges, particularly due to the limited cache storage and the diverse request patterns at the edge. These edge environments can host traffic classes characterized by varied object-size distributions and object-access patterns. Such complexity makes it difficult for traditional caching strategies, which often rely on metrics like request frequency or time intervals, to be effective. Despite these complexities, the optimization of edge caching is crucial. Improved byte hit rates at the edge not only alleviate the load on the network backbone but also minimize operational costs and expedite content delivery to end-users.
In this paper, we introduce HR-Cache, a comprehensive learning-based caching framework grounded in the principles of Hazard Rate (HR) ordering, a rule originally formulated to compute an upper bound on cache performance. HR-Cache leverages this rule to guide future object eviction decisions. It employs a lightweight machine learning model to learn from caching decisions made based on HR ordering, subsequently predicting the "cache-friendliness" of incoming requests. Objects deemed "cache-averse" are placed into cache as priority candidates for eviction. Through extensive experimentation, we demonstrate that HR-Cache not only consistently enhances byte hit rates compared to existing state-of-the-art methods but also achieves this with minimal prediction overhead.
Our experimental results, using three real-world traces and one synthetic trace, indicate that HR-Cache consistently achieves 2.2-14.6% greater WAN traffic savings than LRU. It outperforms not only heuristic caching strategies but also the state-of-the-art learning-based algorithm.
△ Less
Submitted 3 April, 2024; v1 submitted 5 February, 2024;
originally announced February 2024.
-
LearnedWMP: Workload Memory Prediction Using Distribution of Query Templates
Authors:
Shaikh Quader,
Andres Jaramillo,
Sumona Mukhopadhyay,
Ghadeer Abuoda,
Calisto Zuzarte,
David Kalmuk,
Marin Litoiu,
Manos Papagelis
Abstract:
In a modern DBMS, working memory is frequently the limiting factor when processing in-memory analytic query operations such as joins, sorting, and aggregation. Existing resource estimation approaches for a DBMS estimate the resource consumption of a query by computing an estimate of each individual database operator in the query execution plan. Such an approach is slow and error-prone as it relies…
▽ More
In a modern DBMS, working memory is frequently the limiting factor when processing in-memory analytic query operations such as joins, sorting, and aggregation. Existing resource estimation approaches for a DBMS estimate the resource consumption of a query by computing an estimate of each individual database operator in the query execution plan. Such an approach is slow and error-prone as it relies upon simplifying assumptions, such as uniformity and independence of the underlying data. Additionally, the existing approach focuses on individual queries separately and does not factor in other queries in the workload that may be executed concurrently. In this research, we are interested in query performance optimization under concurrent execution of a batch of queries (a workload). Specifically, we focus on predicting the memory demand for a workload rather than providing separate estimates for each query within it. We introduce the problem of workload memory prediction and formalize it as a distribution regression problem. We propose Learned Workload Memory Prediction (LearnedWMP) to improve and simplify estimating the working memory demands of workloads. Through a comprehensive experimental evaluation, we show that LearnedWMP reduces the memory estimation error of the state-of-the-practice method by up to 47.6%. Compared to an alternative single-query model, during training and inferencing, the LearnedWMP model and its variants were 3x to 10x faster. Moreover, LearnedWMP-based models were at least 50% smaller in most cases. Overall, the results demonstrate the advantages of the LearnedWMP approach and its potential for a broader impact on query performance optimization.
△ Less
Submitted 22 January, 2024;
originally announced January 2024.
-
Self-Adaptation in Industry: A Survey
Authors:
Danny Weyns,
Ilias Gerostathopoulos,
Nadeem Abbas,
Jesper Andersson,
Stefan Biffl,
Premek Brada,
Tomas Bures,
Amleto Di Salle,
Matthias Galster,
Patricia Lago,
Grace Lewis,
Marin Litoiu,
Angelika Musil,
Juergen Musil,
Panos Patros,
Patrizio Pelliccione
Abstract:
Computing systems form the backbone of many areas in our society, from manufacturing to traffic control, healthcare, and financial systems. When software plays a vital role in the design, construction, and operation, these systems are referred as software-intensive systems. Self-adaptation equips a software-intensive system with a feedback loop that either automates tasks that otherwise need to be…
▽ More
Computing systems form the backbone of many areas in our society, from manufacturing to traffic control, healthcare, and financial systems. When software plays a vital role in the design, construction, and operation, these systems are referred as software-intensive systems. Self-adaptation equips a software-intensive system with a feedback loop that either automates tasks that otherwise need to be performed by human operators or deals with uncertain conditions. Such feedback loops have found their way to a variety of practical applications; typical examples are an elastic cloud to adapt computing resources and automated server management to respond quickly to business needs. To gain insight into the motivations for applying self-adaptation in practice, the problems solved using self-adaptation and how these problems are solved, and the difficulties and risks that industry faces in adopting self-adaptation, we performed a large-scale survey. We received 184 valid responses from practitioners spread over 21 countries. Based on the analysis of the survey data, we provide an empirically grounded overview of state-of-the-practice in the application of self-adaptation. From that, we derive insights for researchers to check their current research with industrial needs, and for practitioners to compare their current practice in applying self-adaptation. These insights also provide opportunities for the application of self-adaptation in practice and pave the way for future industry-research collaborations.
△ Less
Submitted 6 November, 2022;
originally announced November 2022.
-
Building Automation System Data Integration with BIM: Data Structure and Supporting Case Study
Authors:
Caroline Quinn,
Ali Zargar Shabestari,
Marin Litoiu,
J. J. McArthur
Abstract:
Buildings Automation Systems (BAS) are ubiquitous in contemporary buildings, both monitoring building conditions and managing the building system control points. At present, these controls are prescriptive and pre-determined by the design team, rather than responsive to actual building performance. These are further limited by prescribed logic, possess only rudimentary visualizations, and lack bro…
▽ More
Buildings Automation Systems (BAS) are ubiquitous in contemporary buildings, both monitoring building conditions and managing the building system control points. At present, these controls are prescriptive and pre-determined by the design team, rather than responsive to actual building performance. These are further limited by prescribed logic, possess only rudimentary visualizations, and lack broader system integration capabilities. Advances in machine learning, edge analytics, data management systems, and Facility Management-enabled Building Information Models (FM-BIMs) permit a novel approach: cloud-hosted building management. This paper presents an integration technique for mapping the data from a building Internet of Things (IoT) sensor network to an FM-BIM. The sensor data naming convention and timeseries analysis strategies integrated into the data structure are discussed and presented, including the use of a 3D nested list to permit timeseries data to be mapped to the FM-BIM and readily visualized. The developed approach is presented through a case study of an office living lab consisting of a local sensor network mimicking a BAS, which streams to a cloud server via a virtual private network connection. The resultant data structure and key visualizations are presented to demonstrate the value of this approach, which permits the end-user to select the desired timeframe for visualization and readily step through the spatio-temporal building performance data.
△ Less
Submitted 25 August, 2022; v1 submitted 11 May, 2022;
originally announced May 2022.
-
Has Your FaaS Application Been Decommissioned Yet? -- A Case Study on the Idle Timeout in Function as a Service Infrastructure
Authors:
Kim Long Ngo,
Joydeep Mukherjee,
Zhen Ming Jiang,
Marin Litoiu
Abstract:
Function as a Service (FaaS) is a new cloud technology with automated resource management. Different from traditional cloud computing, each FaaS cloud function can only run a fixed period of time before being decommissioned. Furthermore, FaaS cloud providers often update their platforms (e.g., idle timeout). These changes and their associated impact are not transparent and could potentially impact…
▽ More
Function as a Service (FaaS) is a new cloud technology with automated resource management. Different from traditional cloud computing, each FaaS cloud function can only run a fixed period of time before being decommissioned. Furthermore, FaaS cloud providers often update their platforms (e.g., idle timeout). These changes and their associated impact are not transparent and could potentially impact the execution of the cloud functions. Hence, in this paper, we develop a methodology to characterize the cloud function idle timeout which is the duration a FaaS cloud provider keeps a cloud function instance alive without serving active traffic. Our study was conducted on three popular FaaS platforms, namely AWS Lambda, IBM and Azure Cloud Function. Moreover, we also report how long a cloud function instance can be kept alive when a user regularly polls the instance. Experimental results show that the idle timeout period has evolved from 01/2020 till 01/2022.
△ Less
Submitted 18 March, 2022;
originally announced March 2022.
-
Towards Better Adaptive Systems by Combining MAPE, Control Theory, and Machine Learning
Authors:
Danny Weyns,
Bradley Schmerl,
Masako Kishida,
Alberto Leva,
Marin Litoiu,
Necmiye Ozay,
Colin Paterson,
Kenji Tei
Abstract:
Two established approaches to engineer adaptive systems are architecture-based adaptation that uses a Monitor-Analysis-Planning-Executing (MAPE) loop that reasons over architectural models (aka Knowledge) to make adaptation decisions, and control-based adaptation that relies on principles of control theory (CT) to realize adaptation. Recently, we also observe a rapidly growing interest in applying…
▽ More
Two established approaches to engineer adaptive systems are architecture-based adaptation that uses a Monitor-Analysis-Planning-Executing (MAPE) loop that reasons over architectural models (aka Knowledge) to make adaptation decisions, and control-based adaptation that relies on principles of control theory (CT) to realize adaptation. Recently, we also observe a rapidly growing interest in applying machine learning (ML) to support different adaptation mechanisms. While MAPE and CT have particular characteristics and strengths to be applied independently, in this paper, we are concerned with the question of how these approaches are related with one another and whether combining them and supporting them with ML can produce better adaptive systems. We motivate the combined use of different adaptation approaches using a scenario of a cloud-based enterprise system and illustrate the analysis when combining the different approaches. To conclude, we offer a set of open questions for further research in this interesting area.
△ Less
Submitted 19 March, 2021;
originally announced March 2021.
-
Understanding Brain Dynamics for Color Perception using Wearable EEG headband
Authors:
Mahima Chaudhary,
Sumona Mukhopadhyay,
Marin Litoiu,
Lauren E Sergio,
Meaghan S Adams
Abstract:
The perception of color is an important cognitive feature of the human brain. The variety of colors that impinge upon the human eye can trigger changes in brain activity which can be captured using electroencephalography (EEG). In this work, we have designed a multiclass classification model to detect the primary colors from the features of raw EEG signals. In contrast to previous research, our me…
▽ More
The perception of color is an important cognitive feature of the human brain. The variety of colors that impinge upon the human eye can trigger changes in brain activity which can be captured using electroencephalography (EEG). In this work, we have designed a multiclass classification model to detect the primary colors from the features of raw EEG signals. In contrast to previous research, our method employs spectral power features, statistical features as well as correlation features from the signal band power obtained from continuous Morlet wavelet transform instead of raw EEG, for the classification task. We have applied dimensionality reduction techniques such as Forward Feature Selection and Stacked Autoencoders to reduce the dimension of data eventually increasing the model's efficiency. Our proposed methodology using Forward Selection and Random Forest Classifier gave the best overall accuracy of 80.6\% for intra-subject classification. Our approach shows promise in developing techniques for cognitive tasks using color cues such as controlling Internet of Thing (IoT) devices by looking at primary colors for individuals with restricted motor abilities.
△ Less
Submitted 17 August, 2020;
originally announced August 2020.
-
Performance Modeling of Microservice Platforms
Authors:
Hamzeh Khazaei,
Nima Mahmoudi,
Cornel Barna,
Marin Litoiu
Abstract:
Microservice architecture has transformed the way developers are building and deploying applications in the nowadays cloud computing centers. This new approach provides increased scalability, flexibility, manageability, and performance while reducing the complexity of the whole software development life cycle. The increase in cloud resource utilization also benefits microservice providers. Various…
▽ More
Microservice architecture has transformed the way developers are building and deploying applications in the nowadays cloud computing centers. This new approach provides increased scalability, flexibility, manageability, and performance while reducing the complexity of the whole software development life cycle. The increase in cloud resource utilization also benefits microservice providers. Various microservice platforms have emerged to facilitate the DevOps of containerized services by enabling continuous integration and delivery. Microservice platforms deploy application containers on virtual or physical machines provided by public/private cloud infrastructures in a seamless manner. In this paper, we study and evaluate the provisioning performance of microservice platforms by incorporating the details of all layers (i.e., both micro and macro layers) in the modelling process. To this end, we first build a microservice platform on top of Amazon EC2 cloud and then leverage it to develop a comprehensive performance model to perform what-if analysis and capacity planning for microservice platforms at scale. In other words, the proposed performance model provides a systematic approach to measure the elasticity of the microservice platform by analyzing the provisioning performance at both the microservice platform and the back-end macroservice infrastructures.
△ Less
Submitted 3 October, 2020; v1 submitted 9 February, 2019;
originally announced February 2019.
-
Using Models at Runtime to Address Assurance for Self-Adaptive Systems
Authors:
Betty Cheng,
Kerstin Eder,
Martin Gogolla,
Lars Grunske,
Marin Litoiu,
Hausi Müller,
Patrizio Pelliccione,
Anna Perini,
Nauman Qureshi,
Bernhard Rumpe,
Daniel Schneider,
Frank Trollmann,
Norha Villegas
Abstract:
A self-adaptive software system modifies its behavior at runtime in response to changes within the system or in its execution environment. The fulfillment of the system requirements needs to be guaranteed even in the presence of adverse conditions and adaptations. Thus, a key challenge for self-adaptive software systems is assurance. Traditionally, confidence in the correctness of a system is gain…
▽ More
A self-adaptive software system modifies its behavior at runtime in response to changes within the system or in its execution environment. The fulfillment of the system requirements needs to be guaranteed even in the presence of adverse conditions and adaptations. Thus, a key challenge for self-adaptive software systems is assurance. Traditionally, confidence in the correctness of a system is gained through a variety of activities and processes performed at development time, such as design analysis and testing. In the presence of selfadaptation, however, some of the assurance tasks may need to be performed at runtime. This need calls for the development of techniques that enable continuous assurance throughout the software life cycle. Fundamental to the development of runtime assurance techniques is research into the use of models at runtime (M@RT). This chapter explores the state of the art for usingM@RT to address the assurance of self-adaptive software systems. It defines what information can be captured by M@RT, specifically for the purpose of assurance, and puts this definition into the context of existing work. We then outline key research challenges for assurance at runtime and characterize assurance methods. The chapter concludes with an exploration of selected application areas where M@RT could provide significant benefits beyond existing assurance techniques for adaptive systems.
△ Less
Submitted 5 May, 2015;
originally announced May 2015.