Search SciRate

4 results for au:Joosen_A in:cs

Show all abstracts

Serverless Cold Starts and Where to Find Them
Artjom Joosen, Ahmed Hassan, Martin Asenov, Rajkarn Singh, Luke Darlow, Jianfeng Wang, Qiwen Deng, Adam Barker
Oct 10 2024 cs.DC cs.OS cs.PF arXiv:2410.06145v1

@misc{2410.06145, author = {Artjom Joosen and Ahmed Hassan and Martin Asenov and Rajkarn Singh and Luke Darlow and Jianfeng Wang and Qiwen Deng and Adam Barker}, title = {{S}erverless {C}old {S}tarts and {W}here to {F}ind {T}hem}, year = {2024}, eprint = {2410.06145}, note = {arXiv:2410.06145v1} }
PDF
This paper releases and analyzes a month-long trace of 85 billion user requests and 11.9 million cold starts from Huawei's serverless cloud platform. Our analysis spans workloads from five data centers. We focus on cold starts and provide a comprehensive examination of the underlying factors influencing the number and duration of cold starts. These factors include trigger types, request synchronicity, runtime languages, and function resource allocations. We investigate components of cold starts, including pod allocation time, code and dependency deployment time, and scheduling delays, and examine their relationships with runtime languages, trigger types, and resource allocation. We introduce pod utility ratio to measure the pod's useful lifetime relative to its cold start time, giving a more complete picture of cold starts, and see that some pods with long cold start times have longer useful lifetimes. Our findings reveal the complexity and multifaceted origins of the number, duration, and characteristics of cold starts, driven by differences in trigger types, runtime languages, and function resource allocations. For example, cold starts in Region 1 take up to 7 seconds, dominated by dependency deployment time and scheduling. In Region 2, cold starts take up to 3 seconds and are dominated by pod allocation time. Based on this, we identify opportunities to reduce the number and duration of cold starts using strategies for multi-region scheduling. Finally, we suggest directions for future research to address these challenges and enhance the performance of serverless cloud platforms. Our datasets and code are available here https://github.com/sir-lab/data-release
DAM: Towards A Foundation Model for Time Series Forecasting
Luke Darlow, Qiwen Deng, Ahmed Hassan, Martin Asenov, Rajkarn Singh, Artjom Joosen, Adam Barker, Amos Storkey
Jul 26 2024 cs.LG arXiv:2407.17880v1

@misc{2407.17880, author = {Luke Darlow and Qiwen Deng and Ahmed Hassan and Martin Asenov and Rajkarn Singh and Artjom Joosen and Adam Barker and Amos Storkey}, title = {{DAM}: {T}owards {A} {F}oundation {M}odel for {T}ime {S}eries {F}orecasting}, year = {2024}, eprint = {2407.17880}, note = {arXiv:2407.17880v1} }
PDF
It is challenging to scale time series forecasting models such that they forecast accurately for multiple distinct domains and datasets, all with potentially different underlying collection procedures (e.g., sample resolution), patterns (e.g., periodicity), and prediction requirements (e.g., reconstruction vs. forecasting). We call this general task universal forecasting. Existing methods usually assume that input data is regularly sampled, and they forecast to pre-determined horizons, resulting in failure to generalise outside of the scope of their training. We propose the DAM - a neural model that takes randomly sampled histories and outputs an adjustable basis composition as a continuous function of time for forecasting to non-fixed horizons. It involves three key components: (1) a flexible approach for using randomly sampled histories from a long-tail distribution, that enables an efficient global perspective of the underlying temporal dynamics while retaining focus on the recent history; (2) a transformer backbone that is trained on these actively sampled histories to produce, as representational output, (3) the basis coefficients of a continuous function of time. We show that a single univariate DAM, trained on 25 time series datasets, either outperformed or closely matched existing SoTA models at multivariate long-term forecasting across 18 datasets, including 8 held-out for zero-shot transfer, even though these models were trained to specialise for each dataset-horizon combination. This single DAM excels at zero-shot transfer and very-long-term forecasting, performs well at imputation, is interpretable via basis function composition and attention, can be tuned for different inference-cost requirements, is robust to missing and irregularly sampled data by design.
How Does It Function? Characterizing Long-term Trends in Production Serverless Workloads
Artjom Joosen, Ahmed Hassan, Martin Asenov, Rajkarn Singh, Luke Darlow, Jianfeng Wang, Adam Barker
Dec 19 2023 cs.PF cs.DC cs.LG arXiv:2312.10127v1

@misc{2312.10127, author = {Artjom Joosen and Ahmed Hassan and Martin Asenov and Rajkarn Singh and Luke Darlow and Jianfeng Wang and Adam Barker}, title = {{H}ow {D}oes {I}t {F}unction? {C}haracterizing {L}ong-term {T}rends in {P}roduction {S}erverless {W}orkloads}, year = {2023}, eprint = {2312.10127}, howpublished = {SoCC '23: Proceedings of the 2023 ACM Symposium on Cloud Computing, October 2023, Pages 443-458}, doi = {10.1145/3620678.3624783}, note = {arXiv:2312.10127v1} }
PDF
This paper releases and analyzes two new Huawei cloud serverless traces. The traces span a period of over 7 months with over 1.4 trillion function invocations combined. The first trace is derived from Huawei's internal workloads and contains detailed per-second statistics for 200 functions running across multiple Huawei cloud data centers. The second trace is a representative workload from Huawei's public FaaS platform. This trace contains per-minute arrival rates for over 5000 functions running in a single Huawei data center. We present the internals of a production FaaS platform by characterizing resource consumption, cold-start times, programming languages used, periodicity, per-second versus per-minute burstiness, correlations, and popularity. Our findings show that there is considerable diversity in how serverless functions behave: requests vary by up to 9 orders of magnitude across functions, with some functions executed over 1 billion times per day; scheduling time, execution time and cold-start distributions vary across 2 to 4 orders of magnitude and have very long tails; and function invocation counts demonstrate strong periodicity for many individual functions and on an aggregate level. Our analysis also highlights the need for further research in estimating resource reservations and time-series prediction to account for the huge diversity in how serverless functions behave. Datasets and code available at https://github.com/sir-lab/data-release
Privacy-preserving Object Detection
Peiyang He, Charlie Griffin, Krzysztof Kacprzyk, Artjom Joosen, Michael Collyer, Aleksandar Shtedritski, Yuki M. Asano
Mar 12 2021 cs.CV arXiv:2103.06587v1

@misc{2103.06587, author = {Peiyang He and Charlie Griffin and Krzysztof Kacprzyk and Artjom Joosen and Michael Collyer and Aleksandar Shtedritski and Yuki M.~Asano}, title = {{P}rivacy-preserving {O}bject {D}etection}, year = {2021}, eprint = {2103.06587}, note = {arXiv:2103.06587v1} }
PDF
Privacy considerations and bias in datasets are quickly becoming high-priority issues that the computer vision community needs to face. So far, little attention has been given to practical solutions that do not involve collection of new datasets. In this work, we show that for object detection on COCO, both anonymizing the dataset by blurring faces, as well as swapping faces in a balanced manner along the gender and skin tone dimension, can retain object detection performances while preserving privacy and partially balancing bias.