This paper releases and analyzes a month-long trace of 85 billion user requests and 11.9 million cold starts from Huawei's serverless cloud platform. Our analysis spans workloads from five data centers. We focus on cold starts and provide a comprehensive examination of the underlying factors influencing the number and duration of cold starts. These factors include trigger types, request synchronicity, runtime languages, and function resource allocations. We investigate components of cold starts, including pod allocation time, code and dependency deployment time, and scheduling delays, and examine their relationships with runtime languages, trigger types, and resource allocation. We introduce pod utility ratio to measure the pod's useful lifetime relative to its cold start time, giving a more complete picture of cold starts, and see that some pods with long cold start times have longer useful lifetimes. Our findings reveal the complexity and multifaceted origins of the number, duration, and characteristics of cold starts, driven by differences in trigger types, runtime languages, and function resource allocations. For example, cold starts in Region 1 take up to 7 seconds, dominated by dependency deployment time and scheduling. In Region 2, cold starts take up to 3 seconds and are dominated by pod allocation time. Based on this, we identify opportunities to reduce the number and duration of cold starts using strategies for multi-region scheduling. Finally, we suggest directions for future research to address these challenges and enhance the performance of serverless cloud platforms. Our datasets and code are available here https://github.com/sir-lab/data-release
Jul 26 2024
cs.LG arXiv:2407.17880v1
It is challenging to scale time series forecasting models such that they forecast accurately for multiple distinct domains and datasets, all with potentially different underlying collection procedures (e.g., sample resolution), patterns (e.g., periodicity), and prediction requirements (e.g., reconstruction vs. forecasting). We call this general task universal forecasting. Existing methods usually assume that input data is regularly sampled, and they forecast to pre-determined horizons, resulting in failure to generalise outside of the scope of their training. We propose the DAM - a neural model that takes randomly sampled histories and outputs an adjustable basis composition as a continuous function of time for forecasting to non-fixed horizons. It involves three key components: (1) a flexible approach for using randomly sampled histories from a long-tail distribution, that enables an efficient global perspective of the underlying temporal dynamics while retaining focus on the recent history; (2) a transformer backbone that is trained on these actively sampled histories to produce, as representational output, (3) the basis coefficients of a continuous function of time. We show that a single univariate DAM, trained on 25 time series datasets, either outperformed or closely matched existing SoTA models at multivariate long-term forecasting across 18 datasets, including 8 held-out for zero-shot transfer, even though these models were trained to specialise for each dataset-horizon combination. This single DAM excels at zero-shot transfer and very-long-term forecasting, performs well at imputation, is interpretable via basis function composition and attention, can be tuned for different inference-cost requirements, is robust to missing and irregularly sampled data by design.
Apr 05 2024
cs.DC arXiv:2404.03079v1
Accurately predicting task performance at runtime in a cluster is advantageous for a resource management system to determine whether a task should be migrated due to performance degradation caused by interference. This is beneficial for both cluster operators and service owners. However, deploying performance prediction systems with learning methods requires sophisticated safeguard mechanisms due to the inherent stochastic and black-box natures of these models, such as Deep Neural Networks (DNNs). Vanilla Neural Networks (NNs) can be vulnerable to out-of-distribution data samples that can lead to sub-optimal decisions. To take a step towards a safe learning system in performance prediction, We propose vPALs that leverage well-correlated system metrics, and verification to produce safe performance prediction at runtime, providing an extra layer of safety to integrate learning techniques to cluster resource management systems. Our experiments show that vPALs can outperform vanilla NNs across our benchmark workload.
This paper releases and analyzes two new Huawei cloud serverless traces. The traces span a period of over 7 months with over 1.4 trillion function invocations combined. The first trace is derived from Huawei's internal workloads and contains detailed per-second statistics for 200 functions running across multiple Huawei cloud data centers. The second trace is a representative workload from Huawei's public FaaS platform. This trace contains per-minute arrival rates for over 5000 functions running in a single Huawei data center. We present the internals of a production FaaS platform by characterizing resource consumption, cold-start times, programming languages used, periodicity, per-second versus per-minute burstiness, correlations, and popularity. Our findings show that there is considerable diversity in how serverless functions behave: requests vary by up to 9 orders of magnitude across functions, with some functions executed over 1 billion times per day; scheduling time, execution time and cold-start distributions vary across 2 to 4 orders of magnitude and have very long tails; and function invocation counts demonstrate strong periodicity for many individual functions and on an aggregate level. Our analysis also highlights the need for further research in estimating resource reservations and time-series prediction to account for the huge diversity in how serverless functions behave. Datasets and code available at https://github.com/sir-lab/data-release
Oct 12 2021
cs.CR arXiv:2110.04618v1
While disk encryption is suitable for use in most situations where confidentiality of disks is required, stronger guarantees are required in situations where adversaries may employ coercive tactics to gain access to cryptographic keys. Deniable volumes are one such solution in which the security goal is to prevent an adversary from discovering that there is an encrypted volume. Multiple snapshot attacks, where an adversary is able to gain access to two or more images of a disk, have often been proposed in the deniable storage system literature; however, there have been no concrete attacks proposed or carried out. We present the first multiple snapshot attack, and we find that it is applicable to most, if not all, implemented deniable storage systems. Our attack leverages the pattern of consecutive block changes an adversary would have access to with two snapshots, and demonstrate that with high probability it detects moderately sized and large hidden volumes, while maintaining a low false positive rate.
Cloud providers offer end-users various pricing schemes to allow them to tailor VMs to their needs, e.g., a pay-as-you-go billing scheme, called \textiton-demand, and a discounted contract scheme, called \textitreserved instances. This paper presents a cloud broker which offers users both the flexibility of on-demand instances and some level of discounts found in reserved instances. The broker employs a buy-low-and-sell-high strategy that places user requests into a resource pool of pre-purchased discounted cloud resources. By analysing user request time-series data, the broker takes a risk-oriented approach to dynamically adjust the resource pool. This approach does not require a training process which is useful at processing the large data stream. The broker is evaluated with high-frequency real cloud datasets from Alibaba. The results show that the overall profit of the broker is close to the theoretical optimal scenario where user requests can be perfectly predicted.
Dec 01 2020
cs.NI arXiv:2011.14795v1
Wireless sensor networks (WSN) are characterized by a network of small, battery powered devices, operating remotely with no pre-existing infrastructure. The unique structure of WSN allow for novel approaches to data reduction and energy preservation. This paper presents a modification to the existing Q-routing protocol by providing an alternate action of performing sensor data reduction in place thereby reducing energy consumption, bandwidth usage, and message transmission time. The algorithm is further modified to include an energy factor which increases the cost of forwarding as energy reserves deplete. This encourages the network to conserve energy in favor of network preservation when energy reserves are low. Our experimental results show that this approach can, in periods of high network traffic, simultaneously reduce bandwidth, conserve energy, and maintain low message transition times.
Understanding and predicting the performance of big data applications running in the cloud or on-premises could help minimise the overall cost of operations and provide opportunities in efforts to identify performance bottlenecks. The complexity of the low-level internals of big data frameworks and the ubiquity of application and workload configuration parameters makes it challenging and expensive to come up with comprehensive performance modelling solutions. In this paper, instead of focusing on a wide range of configurable parameters, we studied the low-level internals of the MapReduce communication pattern and used a minimal set of performance drivers to develop a set of phase level parametric models for approximating the execution time of a given application on a given cluster. Models can be used to infer the performance of unseen applications and approximate their performance when an arbitrary dataset is used as input. Our approach is validated by running empirical experiments in two setups. On average the error rate in both setups is plus or minus 10% from the measured values.
We extend previously developed two-level coarsening procedures for graph Laplacian problems written in a mixed saddle point form to the fully recursive multilevel case. The resulting hierarchy of discretizations gives rise to a hierarchy of upscaled models, in the sense that they provide approximation in the natural norms (in the mixed setting). This property enables us to utilize them in three applications: (i) as an accurate reduced model, (ii) as a tool in multilevel Monte Carlo simulations (in application to finite volume discretizations), and (iii) for providing a sequence of nonlinear operators in FAS (full approximation scheme) for solving nonlinear pressure equations discretized by the conservative two-point flux approximation. We illustrate the potential of the proposed multilevel technique in all three applications on a number of popular benchmark problems used in reservoir simulation.
We construct an algebraic multigrid (AMG) based preconditioner for the reduced Hessian of a linear-quadratic optimization problem constrained by an elliptic partial differential equation. While the preconditioner generalizes a geometric multigrid preconditioner introduced in earlier works, its construction relies entirely on a standard AMG infrastructure built for solving the forward elliptic equation, thus allowing for it to be implemented using a variety of AMG methods and standard packages. Our analysis establishes a clear connection between the quality of the preconditioner and the AMG method used. The proposed strategy has a broad and robust applicability to problems with unstructured grids, complex geometry, and varying coefficients. The method is implemented using the Hypre package and several numerical examples are presented.
Robert Anderson, Julian Andrej, Andrew Barker, Jamie Bramwell, Jean-Sylvain Camier, Jakub Cerveny, Veselin Dobrev, Yohann Dudouit, Aaron Fisher, Tzanio Kolev, Will Pazner, Mark Stowell, Vladimir Tomov, Johann Dahm, David Medina, Stefano Zampini MFEM is an open-source, lightweight, flexible and scalable C++ library for modular finite element methods that features arbitrary high-order finite element meshes and spaces, support for a wide variety of discretization approaches and emphasis on usability, portability, and high-performance computing efficiency. MFEM's goal is to provide application scientists with access to cutting-edge algorithms for high-order finite element meshing, discretizations and linear solvers, while enabling researchers to quickly and easily develop and test new algorithms in very general, fully unstructured, high-order, parallel and GPU-accelerated settings. In this paper we describe the underlying algorithms and finite element abstractions provided by MFEM, discuss the software implementation, and illustrate various applications of the library.
Blesson Varghese, Philipp Leitner, Suprio Ray, Kyle Chard, Adam Barker, Yehia Elkhatib, Herry Herry, Cheol-Ho Hong, Jeremy Singer, Fung Po Tso, Eiko Yoneki, Mohamed-Faten Zhani Feb 12 2019
cs.DC arXiv:1902.03656v1
The Cloud has become integral to most Internet-based applications and user gadgets. This article provides a brief history of the Cloud and presents a researcher's view of the prospects for innovating at the infrastructure, middleware, and application and delivery levels of the already crowded Cloud computing stack.
Aug 29 2018
cs.SE arXiv:1808.09199v1
How experiences gained in industry can improve academic research and teaching.
Jul 30 2018
cs.DC arXiv:1807.10507v1
Cloud computing is becoming an almost ubiquitous part of the computing landscape. For many companies today, moving their entire infrastructure and workloads to the cloud reduces complexity, time to deployment, and saves money. Spot Instances, a subset of Amazon's cloud computing infrastructure (EC2), expands on this. They allow a user to bid on spare compute capacity in Amazon's data centres at heavily discounted prices. If demand was ever to increase such that the user's maximum bid is exceeded, their instance is terminated. In this paper, we conduct one of the first detailed analyses of how location affects the overall cost of deployment of a spot instance. We analyse pricing data across all available Amazon Web Services regions for 60 days for a variety of spot instance types. We relate the data we find to the overall AWS region as well as to the Availability Zone within that region. We conclude that location does play a critical role in spot instance pricing and also that pricing differs depending on the granularity of that location - from a more coarse-grained AWS region to a more fine-grained Availability Zone within a region. We relate the pricing differences we find to the price's reliability, confirming whether one can be confident in the prices reported and subsequently, in the ensuing bids one makes. We conclude by showing that it is possible to run workloads on Spot Instances achieving both a very low risk of termination as well as paying very low amounts per hour.
Nov 28 2017
cs.DC arXiv:1711.09138v2
The recent boom of big data, coupled with the challenges of its processing and storage gave rise to the development of distributed data processing and storage paradigms like MapReduce, Spark, and NoSQL databases. With the advent of cloud computing, processing and storing such massive datasets on clusters of machines is now feasible with ease. However, there are limited tools and approaches, which users can rely on to gauge and comprehend the performance of their big data applications deployed locally on clusters, or in the cloud. Researchers have started exploring this area by providing benchmarking suites suitable for big data applications. However, many of these tools are fragmented, complex to deploy and manage, and do not provide transparency with respect to the monetary cost of benchmarking an application. In this paper, we present Plug And Play Bench, an infrastructure aware abstraction built to integrate and simplify the deployment of big data benchmarking tools on clusters of machines. PAPB automates the tedious process of installing, configuring and executing common big data benchmark workloads by containerising the tools and settings based on the underlying cluster deployment framework. Our proof of concept implementation utilises HiBench as the benchmark suite, HDP as the cluster deployment framework and Azure as the cloud platform. The paper further illustrates the inclusion of cost metrics based on the underlying Microsoft Azure cloud platform.
Nov 27 2017
cs.DC arXiv:1711.08973v1
Cloud computing has been widely adopted due to the flexibility in resource provisioning and on-demand pricing models. Entire clusters of Virtual Machines (VMs) can be dynamically provisioned to meet the computational demands of users. However, from a user's perspective, it is still challenging to utilise cloud resources efficiently. This is because an overwhelmingly wide variety of resource types with different prices and significant performance variations are available. This paper presents a survey and taxonomy of existing research in optimising the execution of Bag-of-Task applications on cloud resources. A BoT application consists of multiple independent tasks, each of which can be executed by a VM in any order; these applications are widely used by both the scientific communities and commercial organisations. The objectives of this survey are as follows: (i) to provide the reader with a concise understanding of existing research on optimising the execution of BoT applications on the cloud, (ii) to define a taxonomy that categorises current frameworks to compare and contrast them, and (iii) to present current trends and future research directions in the area.
Nov 16 2017
cs.DC arXiv:1711.05518v1
This paper presents MAMoC, a framework which brings together a diverse range of infrastructure types including mobile devices, cloudlets, and remote cloud resources under one unified API. MAMoC allows mobile applications to leverage the power of multiple offloading destinations. MAMoC's intelligent offloading decision engine adapts to the contextual changes in this heterogeneous environment, in order to reduce the overall runtime for both single-site and multi-site offloading scenarios. MAMoC is evaluated through a set of offloading experiments, which evaluate the performance of our offloading decision engine. The results show that offloading computation using our framework can reduce the overall task completion time for both single-site and multi-site offloading scenarios.
Jul 11 2017
cs.DC arXiv:1707.02639v1
Current HPC platforms do not provide the infrastructure, interfaces and conceptual models to collect, store, analyze, and access such data. Today, applications depend on application and platform specific techniques for collecting telemetry data; introducing significant development overheads that inhibit portability and mobility. The development and adoption of adaptive, context-aware strategies is thereby impaired. To facilitate 2nd generation applications, more efficient application development, and swift adoption of adaptive applications in production, a comprehensive framework for telemetry data management must be provided by future HPC systems and services. We introduce a conceptual model and a software framework to collect, store, analyze, and exploit streams of telemetry data generated by HPC systems and their applications. We show how this framework can be integrated with HPC platform architectures and how it enables common application execution strategies.
Feb 21 2017
cs.DC arXiv:1702.05513v2
A new class of Second generation high-performance computing applications with heterogeneous, dynamic and data-intensive properties have an extended set of requirements, which cover application deployment, resource allocation, -control, and I/O scheduling. These requirements are not met by the current production HPC platform models and policies. This results in a loss of opportunity, productivity and innovation for new computational methods and tools. It also decreases effective system utilization for platform providers due to unsupervised workarounds and rogue resource management strategies implemented in application space. In this paper we critically discuss the dominant HPC platform model and describe the challenges it creates for second generation applications because of its asymmetric resource view, interfaces and software deployment policies. We present an extended, more symmetric and application-centric platform model that adds decentralized deployment, introspection, bidirectional control and information flow and more comprehensive resource scheduling. We describe cHPC: an early prototype of a non-disruptive implementation based on Linux Containers (LXC). It can operate alongside existing batch queuing systems and exposes a symmetric platform API without interfering with existing applications and usage modes. We see our approach as a viable, incremental next step in HPC platform evolution that benefits applications and platform providers alike. To demonstrate this further, we layout out a roadmap for future research and experimental evaluation.
Aug 02 2016
cs.DC arXiv:1608.00406v1
How can applications be deployed on the cloud to achieve maximum performance? This question is challenging to address with the availability of a wide variety of cloud Virtual Machines (VMs) with different performance capabilities. The research reported in this paper addresses the above question by proposing a six step benchmarking methodology in which a user provides a set of weights that indicate how important memory, local communication, computation and storage related operations are to an application. The user can either provide a set of four abstract weights or eight fine grain weights based on the knowledge of the application. The weights along with benchmarking data collected from the cloud are used to generate a set of two rankings - one based only on the performance of the VMs and the other takes both performance and costs into account. The rankings are validated on three case study applications using two validation techniques. The case studies on a set of experimental VMs highlight that maximum performance can be achieved by the three top ranked VMs and maximum performance in a cost-effective manner is achieved by at least one of the top three ranked VMs produced by the methodology.
Apr 18 2016
cs.AI arXiv:1604.04506v1
This paper presents the first framework for integrating procedural knowledge, or "know-how", into the Linked Data Cloud. Know-how available on the Web, such as step-by-step instructions, is largely unstructured and isolated from other sources of online knowledge. To overcome these limitations, we propose extending to procedural knowledge the benefits that Linked Data has already brought to representing, retrieving and reusing declarative knowledge. We describe a framework for representing generic know-how as Linked Data and for automatically acquiring this representation from existing resources on the Web. This system also allows the automatic generation of links between different know-how resources, and between those resources and other online knowledge bases, such as DBpedia. We discuss the results of applying this framework to a real-world scenario and we show how it outperforms existing manual community-driven integration efforts.
Mar 25 2016
cs.DC arXiv:1603.07357v1
Existing benchmarking methods are time consuming processes as they typically benchmark the entire Virtual Machine (VM) in order to generate accurate performance data, making them less suitable for real-time analytics. The research in this paper is aimed to surmount the above challenge by presenting DocLite - Docker Container-based Lightweight benchmarking tool. DocLite explores lightweight cloud benchmarking methods for rapidly executing benchmarks in near real-time. DocLite is built on the Docker container technology, which allows a user-defined memory size and number of CPU cores of the VM to be benchmarked. The tool incorporates two benchmarking methods - the first referred to as the native method employs containers to benchmark a small portion of the VM and generate performance ranks, and the second uses historic benchmark data along with the native method as a hybrid to generate VM ranks. The proposed methods are evaluated on three use-cases and are observed to be up to 91 times faster than benchmarking the entire VM. In both methods, small containers provide the same quality of rankings as a large container. The native method generates ranks with over 90% and 86% accuracy for sequential and parallel execution of an application compared against benchmarking the whole VM. The hybrid method did not improve the quality of the rankings significantly.
Mar 08 2016
cs.AI arXiv:1603.01722v1
The increasing amount of available Linked Data resources is laying the foundations for more advanced Semantic Web applications. One of their main limitations, however, remains the general low level of data quality. In this paper we focus on a measure of quality which is negatively affected by the increase of the available resources. We propose a measure of semantic richness of Linked Data concepts and we demonstrate our hypothesis that the more a concept is reused, the less semantically rich it becomes. This is a significant scalability issue, as one of the core aspects of Linked Data is the propagation of semantic information on the Web by reusing common terms. We prove our hypothesis with respect to our measure of semantic richness and we validate our model empirically. Finally, we suggest possible future directions to address this scalability problem.
Jan 18 2016
cs.DC arXiv:1601.03872v1
With the availability of a wide range of cloud Virtual Machines (VMs) it is difficult to determine which VMs can maximise the performance of an application. Benchmarking is commonly used to this end for capturing the performance of VMs. Most cloud benchmarking techniques are typically heavyweight - time consuming processes which have to benchmark the entire VM in order to obtain accurate benchmark data. Such benchmarks cannot be used in real-time on the cloud and incur extra costs even before an application is deployed. In this paper, we present lightweight cloud benchmarking techniques that execute quickly and can be used in near real-time on the cloud. The exploration of lightweight benchmarking techniques are facilitated by the development of DocLite - Docker Container-based Lightweight Benchmarking. DocLite is built on the Docker container technology which allows a user-defined portion (such as memory size and the number of CPU cores) of the VM to be benchmarked. DocLite operates in two modes, in the first mode, containers are used to benchmark a small portion of the VM to generate performance ranks. In the second mode, historic benchmark data is used along with the first mode as a hybrid to generate VM ranks. The generated ranks are evaluated against three scientific high-performance computing applications. The proposed techniques are up to 91 times faster than a heavyweight technique which benchmarks the entire VM. It is observed that the first mode can generate ranks with over 90% and 86% accuracy for sequential and parallel execution of an application. The hybrid mode improves the correlation slightly but the first mode is sufficient for benchmarking cloud VMs.
Jul 21 2015
cs.DC arXiv:1507.05467v1
Optimising the execution of Bag-of-Tasks (BoT) applications on the cloud is a hard problem due to the trade- offs between performance and monetary cost. The problem can be further complicated when multiple BoT applications need to be executed. In this paper, we propose and implement a heuristic algorithm that schedules tasks of multiple applications onto different cloud virtual machines in order to maximise performance while satisfying a given budget constraint. Current approaches are limited in task scheduling since they place a limit on the number of cloud resources that can be employed by the applications. However, in the proposed algorithm there are no such limits, and in comparison with other approaches, the algorithm on average achieves an improved performance of 10%. The experimental results also highlight that the algorithm yields consistent performance even with low budget constraints which cannot be achieved by competing approaches.
Jul 21 2015
cs.DC arXiv:1507.05470v1
Scheduling Bag-of-Tasks (BoT) applications on the cloud can be more challenging than grid and cluster environ- ments. This is because a user may have a budgetary constraint or a deadline for executing the BoT application in order to keep the overall execution costs low. The research in this paper is motivated to investigate task scheduling on the cloud, given two hard constraints based on a user-defined budget and a deadline. A heuristic algorithm is proposed and implemented to satisfy the hard constraints for executing the BoT application in a cost effective manner. The proposed algorithm is evaluated using four scenarios that are based on the trade-off between performance and the cost of using different cloud resource types. The experimental evaluation confirms the feasibility of the algorithm in satisfying the constraints. The key observation is that multiple resource types can be a better alternative to using a single type of resource.
A Cloud Services Brokerage (CSB) acts as an intermediary between cloud service providers (e.g., Amazon and Google) and cloud service end users, providing a number of value adding services. CSBs as a research topic are in there infancy. The goal of this paper is to provide a concise survey of existing CSB technologies in a variety of areas and highlight a roadmap, which details five future opportunities for research.
Jun 02 2015
cs.DC arXiv:1506.00590v1
Bag-of-Distributed-Tasks (BoDT) application is the collection of identical and independent tasks each of which requires a piece of input data located around the world. As a result, Cloud computing offers an ef- fective way to execute BoT application as it not only consists of multiple geographically distributed data centres but also allows a user to pay for what she actually uses only. In this paper, BoDT on the Cloud using virtually unlimited cloud resources. A heuristic algorithm is proposed to find an execution plan that takes budget constraints into account. Compared with other approaches, with the same given budget, our algorithm is able to reduce the overall execution time up to 50%.
Jun 01 2015
cs.DC arXiv:1505.08097v2
This paper presents the first complete, integrated and end-to-end solution for ad hoc cloud computing environments. Ad hoc clouds harvest resources from existing sporadically available, non-exclusive (i.e. primarily used for some other purpose) and unreliable infrastructures. In this paper we discuss the problems ad hoc cloud computing solves and outline our architecture which is based on BOINC.
Autonomously detecting and recovering from faults is one approach for reducing the operational complexity and costs associated with managing computing environments. We present a novel methodology for autonomously generating investigation leads that help identify systems faults, and extends our previous work in this area by leveraging Restricted Boltzmann Machines (RBMs) and contrastive divergence learning to analyse changes in historical feature data. This allows us to heuristically identify the root cause of a fault, and demonstrate an improvement to the state of the art by showing feature data can be predicted heuristically beyond a single instance to include entire sequences of information.
Applications employed in the financial services industry to capture and estimate a variety of risk metrics are underpinned by stochastic simulations which are data, memory and computationally intensive. Many of these simulations are routinely performed on production-based computing systems. Ad hoc simulations in addition to routine simulations are required to obtain up-to-date views of risk metrics. Such simulations are currently not performed as they cannot be accommodated on production clusters, which are typically over committed resources. Scalable, on-demand and pay-as-you go Virtual Machines (VMs) offered by the cloud are a potential platform to satisfy the data, memory and computational constraints of the simulation. However, "Are clouds ready to accelerate ad hoc financial simulations?" The research reported in this paper aims to experimentally verify this question by developing and deploying an important financial simulation, referred to as 'Aggregate Risk Analysis' on the cloud. Parallel techniques to improve efficiency and performance of the simulations are explored. Challenges such as accommodating large input data on limited memory VMs and rapidly processing data for real-time use are surmounted. The key result of this investigation is that Aggregate Risk Analysis can be accommodated on cloud VMs. Acceleration of up to 24x using multiple hardware accelerators over the implementation on a single accelerator, 6x over a multiple core implementation and approximately 60x over a baseline implementation was achieved on the cloud. However, computational time is wasted for every dollar spent on the cloud due to poor acceleration over multiple virtual cores. Interestingly, private VMs can offer better performance than public VMs on comparable underlying hardware.
Nov 06 2014
cs.DC arXiv:1411.1215v1
This paper argues that there are three fundamental challenges that need to be overcome in order to foster the adoption of big data technologies in non-computer science related disciplines: addressing issues of accessibility of such technologies for non-computer scientists, supporting the ad hoc exploration of large data sets with minimal effort and the availability of lightweight web-based frameworks for quick and easy analytics. In this paper, we address the above three challenges through the development of 'BigExcel', a three tier web-based framework for exploring big data to facilitate the management of user interactions with large data sets, the construction of queries to explore the data set and the management of the infrastructure. The feasibility of BigExcel is demonstrated through two Yahoo Sandbox datasets. The first dataset is the Yahoo Buzz Score data set we use for quantitatively predicting trending technologies and the second is the Yahoo n-gram corpus we use for qualitatively inferring the coverage of important events. A demonstration of the BigExcel framework and source code is available at http://bigdata.cs.st-andrews.ac.uk/projects/bigexcel-exploring-big-data-for-social-sciences/.
How can applications be deployed on the cloud to achieve maximum performance? This question has become significant and challenging with the availability of a wide variety of Virtual Machines (VMs) with different performance capabilities in the cloud. The above question is addressed by proposing a six step benchmarking methodology in which a user provides a set of four weights that indicate how important each of the following groups: memory, processor, computation and storage are to the application that needs to be executed on the cloud. The weights along with cloud benchmarking data are used to generate a ranking of VMs that can maximise performance of the application. The rankings are validated through an empirical analysis using two case study applications; the first is a financial risk application and the second is a molecular dynamics simulation, which are both representative of workloads that can benefit from execution on the cloud. Both case studies validate the feasibility of the methodology and highlight that maximum performance can be achieved on the cloud by selecting the top ranked VMs produced by the methodology.
This paper proposes a novel framework for representing community know-how on the Semantic Web. Procedural knowledge generated by web communities typically takes the form of natural language instructions or videos and is largely unstructured. The absence of semantic structure impedes the deployment of many useful applications, in particular the ability to discover and integrate know-how automatically. We discuss the characteristics of community know-how and argue that existing knowledge representation frameworks fail to represent it adequately. We present a novel framework for representing the semantic structure of community know-how and demonstrate the feasibility of our approach by providing a concrete implementation which includes a method for automatically acquiring procedural knowledge for real-world tasks.
Oct 31 2014
cs.DC arXiv:1410.8357v1
Bag of Distributed Tasks (BoDT) can benefit from decentralised execution on the Cloud. However, there is a trade-off between the performance that can be achieved by employing a large number of Cloud VMs for the tasks and the monetary constraints that are often placed by a user. The research reported in this paper is motivated towards investigating this trade-off so that an optimal plan for deploying BoDT applications on the cloud can be generated. A heuristic algorithm, which considers the user's preference of performance and cost is proposed and implemented. The feasibility of the algorithm is demonstrated by generating execution plans for a sample application. The key result is that the algorithm generates optimal execution plans for the application over 91\% of the time.
Oct 31 2014
cs.DC arXiv:1410.8359v1
When orchestrating Web service workflows, the geographical placement of the orchestration engine(s) can greatly affect workflow performance. Data may have to be transferred across long geographical distances, which in turn increases execution time and degrades the overall performance of a workflow. In this paper, we present a framework that, given a DAG-based workflow specification, computes the op- timal Amazon EC2 cloud regions to deploy the orchestration engines and execute a workflow. The framework incorporates a constraint model that solves the workflow deployment problem, which is generated using an automated constraint modelling system. The feasibility of the framework is evaluated by executing different sample workflows representative of sci- entific workloads. The experimental results indicate that the framework reduces the workflow execution time and provides a speed up of 1.3x-2.5x over centralised approaches.
Oct 23 2014
cs.DC arXiv:1410.5976v1
When orchestrating highly distributed and data-intensive Web service workflows the geographical placement of the orchestration engine can greatly affect the overall performance of a workflow. We present CloudForecast: a Web service framework and analysis tool which, given a workflow specification, computes the optimal Amazon EC2 Cloud region to automatically deploy the orchestration engine and execute the workflow. We use geographical distance of the workflow, network latency and HTTP round-trip time between Amazon Cloud regions and the workflow nodes to find a ranking of Cloud regions. This overall ranking predicts where the workflow orchestration engine should be deployed in order to reduce overall execution time. Our experimental results show that our proposed optimisation strategy, depending on the particular workflow, can speed up execution time on average by 82.25% compared to local execution.
Sep 30 2014
cs.DC arXiv:1409.8098v1
Orchestrating service-oriented workflows is typically based on a design model that routes both data and control through a single point - the centralised workflow engine. This causes scalability problems that include the unnecessary consumption of the network bandwidth, high latency in transmitting data between the services, and performance bottlenecks. These problems are highly prominent when orchestrating workflows that are composed from services dispersed across distant geographical locations. This paper presents a novel workflow partitioning approach, which attempts to improve the scalability of orchestrating large-scale workflows. It permits the workflow computation to be moved towards the services providing the data in order to garner optimal performance results. This is achieved by decomposing the workflow into smaller sub workflows for parallel execution, and determining the most appropriate network locations to which these sub workflows are transmitted and subsequently executed. This paper demonstrates the efficiency of our approach using a set of experimental workflows that are orchestrated over Amazon EC2 and across several geographic network regions.
Jun 20 2014
cs.DC arXiv:1406.4974v1
This discussion paper argues that there are five fundamental pitfalls, which can restrict academics from conducting cloud computing research at the infrastructure level, which is currently where the vast majority of academic research lies. Instead academics should be conducting higher risk research, in order to gain understanding and open up entirely new areas. We call for a renewed mindset and argue that academic research should focus less upon physical infrastructure and embrace the abstractions provided by clouds through five opportunities: user driven research, new programming models, PaaS environments, and improved tools to support elasticity and large-scale debugging. The objective of this paper is to foster discussion, and to define a roadmap forward, which will allow academia to make longer-term impacts to the cloud computing community.
Server clustering is a common design principle employed by many organisations who require high availability, scalability and easier management of their infrastructure. Servers are typically clustered according to the service they provide whether it be the application(s) installed, the role of the server or server accessibility for example. In order to optimize performance, manage load and maintain availability, servers may migrate from one cluster group to another making it difficult for server monitoring tools to continuously monitor these dynamically changing groups. Server monitoring tools are usually statically configured and with any change of group membership requires manual reconfiguration; an unreasonable task to undertake on large-scale cloud infrastructures. In this paper we present the Cloudlet Control and Management System (C2MS); a system for monitoring and controlling dynamic groups of physical or virtual servers within cloud infrastructures. The C2MS extends Ganglia - an open source scalable system performance monitoring tool - by allowing system administrators to define, monitor and modify server groups without the need for server reconfiguration. In turn administrators can easily monitor group and individual server metrics on large-scale dynamic cloud infrastructures where roles of servers may change frequently. Furthermore, we complement group monitoring with a control element allowing administrator-specified actions to be performed over servers within service groups as well as introduce further customized monitoring metrics. This paper outlines the design, implementation and evaluation of the C2MS.
Sep 26 2013
cs.DC arXiv:1309.6452v2
When orchestrating highly distributed and data-intensive Web service workflows the geographical placement of the orchestration engine can greatly affect the overall performance of a workflow. Orchestration engines are typically run from within an organisations' network, and may have to transfer data across long geographical distances, which in turn increases execution time and degrades the overall performance of a workflow. In this paper we present CloudForecast: a Web service framework and analysis tool which given a workflow specification, computes the optimal Amazon EC2 Cloud region to automatically deploy the orchestration engine and execute the workflow. We use geographical distance of the workflow, network latency and HTTP round-trip time between Amazon Cloud regions and the workflow nodes to find a ranking of Cloud regions. This combined set of simple metrics effectively predicts where the workflow orchestration engine should be deployed in order to reduce overall execution time. We evaluate our approach by executing randomly generated data-intensive workflows deployed on the PlanetLab platform in order to rank Amazon EC2 Cloud regions. Our experimental results show that our proposed optimisation strategy, depending on the particular workflow, can speed up execution time on average by 82.25% compared to local execution. We also show that the standard deviation of execution time is reduced by an average of almost 65% using the optimisation strategy.
Sep 23 2013
cs.DB arXiv:1309.5821v1
The term big data has become ubiquitous. Owing to a shared origin between academia, industry and the media there is no single unified definition, and various stakeholders provide diverse and often contradictory definitions. The lack of a consistent definition introduces ambiguity and hampers discourse relating to big data. This short paper attempts to collate the various definitions which have gained some degree of traction and to furnish a clear and concise definition of an otherwise ambiguous term.
Large-scale ad hoc analytics of genomic data is popular using the R-programming language supported by 671 software packages provided by Bioconductor. More recently, analytical jobs are benefitting from on-demand computing and storage, their scalability and their low maintenance cost, all of which are offered by the cloud. While Biologists and Bioinformaticists can take an analytical job and execute it on their personal workstations, it remains challenging to seamlessly execute the job on the cloud infrastructure without extensive knowledge of the cloud dashboard. How analytical jobs can not only with minimum effort be executed on the cloud, but also how both the resources and data required by the job can be managed is explored in this paper. An open-source light-weight framework for executing R-scripts using Bioconductor packages, referred to as `RBioCloud', is designed and developed. RBioCloud offers a set of simple command-line tools for managing the cloud resources, the data and the execution of the job. Three biological test cases validate the feasibility of RBioCloud. The framework is publicly available from http://www.rbiocloud.com.
Analysis of information retrieved from microblogging services such as Twitter can provide valuable insight into public sentiment in a geographic region. This insight can be enriched by visualising information in its geographic context. Two underlying approaches for sentiment analysis are dictionary-based and machine learning. The former is popular for public sentiment analysis, and the latter has found limited use for aggregating public sentiment from Twitter data. The research presented in this paper aims to extend the machine learning approach for aggregating public sentiment. To this end, a framework for analysing and visualising public sentiment from a Twitter corpus is developed. A dictionary-based approach and a machine learning approach are implemented within the framework and compared using one UK case study, namely the royal birth of 2013. The case study validates the feasibility of the framework for analysis and rapid visualisation. One observation is that there is good correlation between the results produced by the popular dictionary-based approach and the machine learning approach when large volumes of tweets are analysed. However, for rapid analysis to be possible faster methods need to be developed using big data techniques and parallel methods.
Jun 07 2013
cs.DC arXiv:1306.1394v1
Cloud computing is a recent paradigm based around the notion of delivery of resources via a service model over the Internet. Despite being a new paradigm of computation, cloud computing owes its origins to a number of previous paradigms. The term cloud computing is well defined and no longer merits rigorous taxonomies to furnish a definition. Instead this survey paper considers the past, present and future of cloud computing. As an evolution of previous paradigms, we consider the predecessors to cloud computing and what significance they still hold to cloud services. Additionally we examine the technologies which comprise cloud computing and how the challenges and future developments of these technologies will influence the field. Finally we examine the challenges that limit the growth, application and development of cloud computing and suggest directions required to overcome these challenges in order to further the success of cloud computing.
The Berkeley Open Infrastructure for Network Computing (BOINC) is an open source client-server middleware system created to allow projects with large computational requirements, usually set in the scientific domain, to utilize a technically unlimited number of volunteer machines distributed over large physical distances. However various problems exist deploying applications over these heterogeneous machines using BOINC: applications must be ported to each machine architecture type, the project server must be trusted to supply authentic applications, applications that do not regularly checkpoint may lose execution progress upon volunteer machine termination and applications that have dependencies may find it difficult to run under BOINC. To solve such problems we introduce virtual BOINC, or V-BOINC, where virtual machines are used to run computations on volunteer machines. Application developers can then compile their applications on a single architecture, checkpointing issues are solved through virtualization API's and many security concerns are addressed via the virtual machine's sandbox environment. In this paper we focus on outlining a unique approach on how virtualization can be introduced into BOINC and demonstrate that V-BOINC offers acceptable computational performance when compared to regular BOINC. Finally we show that applications with dependencies can easily run under V-BOINC in turn increasing the computational potential volunteer computing offers to the general public and project developers.
Jun 03 2013
cs.DC arXiv:1305.7403v1
Monitoring is an essential aspect of maintaining and developing computer systems that increases in difficulty proportional to the size of the system. The need for robust monitoring tools has become more evident with the advent of cloud computing. Infrastructure as a Service (IaaS) clouds allow end users to deploy vast numbers of virtual machines as part of dynamic and transient architectures. Current monitoring solutions, including many of those in the open-source domain rely on outdated concepts including manual deployment and configuration, centralised data collection and adapt poorly to membership churn. In this paper we propose the development of a cloud monitoring suite to provide scalable and robust lookup, data collection and analysis services for large-scale cloud systems. In lieu of centrally managed monitoring we propose a multi-tier architecture using a layered gossip protocol to aggregate monitoring information and facilitate lookup, information collection and the identification of redundant capacity. This allows for a resource aware data collection and storage architecture that operates over the system being monitored. This in turn enables monitoring to be done in-situ without the need for significant additional infrastructure to facilitate monitoring services. We evaluate this approach against alternative monitoring paradigms and demonstrate how our solution is well adapted to usage in a cloud-computing context.
Orchestrating centralised service-oriented workflows presents significant scalability challenges that include: the consumption of network bandwidth, degradation of performance, and single points of failure. This paper presents a high-level dataflow specification language that attempts to address these scalability challenges. This language provides simple abstractions for orchestrating large-scale web service workflows, and separates between the workflow logic and its execution. It is based on a data-driven model that permits parallelism to improve the workflow performance. We provide a decentralised architecture that allows the computation logic to be moved "closer" to services involved in the workflow. This is achieved through partitioning the workflow specification into smaller fragments that may be sent to remote orchestration services for execution. The orchestration services rely on proxies that exploit connectivity to services in the workflow. These proxies perform service invocations and compositions on behalf of the orchestration services, and carry out data collection, retrieval, and mediation tasks. The evaluation of our architecture implementation concludes that our decentralised approach reduces the execution time of workflows, and scales accordingly with the increasing size of data sets.
May 09 2013
cs.DC arXiv:1305.1842v1
Service-oriented workflows are typically executed using a centralised orchestration approach that presents significant scalability challenges. These challenges include the consumption of network bandwidth, degradation of performance, and single-points of failure. We provide a decentralised orchestration architecture that attempts to address these challenges. Our architecture adopts a design model that permits the computation to be moved "closer" to services in a workflow. This is achieved by partitioning workflows specified using our simple dataflow language into smaller fragments, which may be sent to remote locations for execution.
As the number of services and the size of data involved in workflows increases, centralised orchestration techniques are reaching the limits of scalability. In the classic orchestration model, all data passes through a centralised engine, which results in unnecessary data transfer, wasted bandwidth and the engine to become a bottleneck to the execution of a workflow. This paper presents and evaluates the Circulate architecture which maintains the robustness and simplicity of centralised orchestration, but facilitates choreography by allowing services to exchange data directly with one another. Circulate could be realised within any existing workflow framework, in this paper, we focus on WS-Circulate, a Web services based implementation. Taking inspiration from the Montage workflow, a number of common workflow patterns (sequence, fan-in and fan-out), input to output data size relationships and network configurations are identified and evaluated. The performance analysis concludes that a substantial reduction in communication overhead results in a 2-4 fold performance benefit across all patterns. An end-to-end pattern through the Montage workflow results in an 8 fold performance benefit and demonstrates how the advantage of using the Circulate architecture increases as the complexity of a workflow grows.
Within the context of the EU Design Study Developmental Gene Expression Map, we identify a set of challenges when facilitating collaborative research on early human embryo development. These challenges bring forth requirements, for which we have identified solutions and technology. We summarise our solutions and demonstrate how they integrate to form an e-infrastructure to support collaborative research in this area of developmental biology.