subscribe to arXiv mailings

ODIN: Overcoming Dynamic Interference in iNference pipelines

Authors: Pirah Noor Soomro, Nikela Papadopoulou, Miquel Pericàs

Abstract: As an increasing number of businesses becomes powered by machine-learning, inference becomes a core operation, with a growing trend to be offered as a service. In this context, the inference task must meet certain service-level objectives (SLOs), such as high throughput and low latency. However, these targets can be compromised by interference caused by long- or short-lived co-located tasks. Prior… ▽ More As an increasing number of businesses becomes powered by machine-learning, inference becomes a core operation, with a growing trend to be offered as a service. In this context, the inference task must meet certain service-level objectives (SLOs), such as high throughput and low latency. However, these targets can be compromised by interference caused by long- or short-lived co-located tasks. Prior works focus on the generic problem of co-scheduling to mitigate the effect of interference on the performance-critical task. In this work, we focus on inference pipelines and propose ODIN, a technique to mitigate the effect of interference on the performance of the inference task, based on the online scheduling of the pipeline stages. Our technique detects interference online and automatically re-balances the pipeline stages to mitigate the performance degradation of the inference task. We demonstrate that ODIN successfully mitigates the effect of interference, sustaining the latency and throughput of CNN inference, and outperforms the least-loaded scheduling (LLS), a common technique for interference mitigation. Additionally, it is effective in maintaining service-level objectives for inference, and it is scalable to large network models executing on multiple processing elements. △ Less

Submitted 2 June, 2023; originally announced June 2023.

Comments: To appear at Euro-Par 2023

arXiv:2202.11575 [pdf, other]

Shisha: Online scheduling of CNN pipelines on heterogeneous architectures

Authors: Pirah Noor Soomro, Mustafa Abduljabbar, Jeronimo Castrillon, Miquel Pericàs

Abstract: Chiplets have become a common methodology in modern chip design. Chiplets improve yield and enable heterogeneity at the level of cores, memory subsystem and the interconnect. Convolutional Neural Networks (CNNs) have high computational, bandwidth and memory capacity requirements owing to the increasingly large amount of weights. Thus to exploit chiplet-based architectures, CNNs must be optimized i… ▽ More Chiplets have become a common methodology in modern chip design. Chiplets improve yield and enable heterogeneity at the level of cores, memory subsystem and the interconnect. Convolutional Neural Networks (CNNs) have high computational, bandwidth and memory capacity requirements owing to the increasingly large amount of weights. Thus to exploit chiplet-based architectures, CNNs must be optimized in terms of scheduling and workload distribution among computing resources. We propose Shisha, an online approach to generate and schedule parallel CNN pipelines on chiplet architectures. Shisha targets heterogeneity in compute performance and memory bandwidth and tunes the pipeline schedule through a fast online exploration technique. We compare Shisha with Simulated Annealing, Hill Climbing and Pipe-Search. On average, the convergence time is improved by ~35x in Shisha compared to other exploration algorithms. Despite the quick exploration, Shisha's solution is often better than that of other heuristic exploration algorithms. △ Less

Submitted 4 December, 2022; v1 submitted 23 February, 2022; originally announced February 2022.

arXiv:2009.00915 [pdf, other]

Scheduling Task-parallel Applications in Dynamically Asymmetric Environments

Authors: Jing Chen, Pirah Noor Soomro, Mustafa Abduljabbar, Madhavan Manivannan, Miquel Pericas

Abstract: Shared resource interference is observed by applications as dynamic performance asymmetry. Prior art has developed approaches to reduce the impact of performance asymmetry mainly at the operating system and architectural levels. In this work, we study how application-level scheduling techniques can leverage moldability (i.e. flexibility to work as either single-threaded or multithreaded task) and… ▽ More Shared resource interference is observed by applications as dynamic performance asymmetry. Prior art has developed approaches to reduce the impact of performance asymmetry mainly at the operating system and architectural levels. In this work, we study how application-level scheduling techniques can leverage moldability (i.e. flexibility to work as either single-threaded or multithreaded task) and explicit knowledge on task criticality to handle scenarios in which system performance is not only unknown but also changing over time. Our proposed task scheduler dynamically learns the performance characteristics of the underlying platform and uses this knowledge to devise better schedules aware of dynamic performance asymmetry, hence reducing the impact of interference. Our evaluation shows that both criticality-aware scheduling and parallelism tuning are effective schemes to address interference in both shared and distributed memory applications △ Less

Submitted 22 September, 2020; v1 submitted 2 September, 2020; originally announced September 2020.

Comments: Published in ICPP Workshops '20

arXiv:1912.01563 [pdf, other]

LEGaTO: Low-Energy, Secure, and Resilient Toolset for Heterogeneous Computing

Authors: B. Salami, K. Parasyris, A. Cristal, O. Unsal, X. Martorell, P. Carpenter, R. De La Cruz, L. Bautista, D. Jimenez, C. Alvarez, S. Nabavi, S. Madonar, M. Pericas, P. Trancoso, M. Abduljabbar, J. Chen, P. N. Soomro, M Manivannan, M. Berge, S. Krupop, F. Klawonn, Al Mekhlafi, S. May, T. Becker, G. Gaydadjiev , et al. (20 additional authors not shown)

Abstract: The LEGaTO project leverages task-based programming models to provide a software ecosystem for Made in-Europe heterogeneous hardware composed of CPUs, GPUs, FPGAs and dataflow engines. The aim is to attain one order of magnitude energy savings from the edge to the converged cloud/HPC, balanced with the security and resilience challenges. LEGaTO is an ongoing three-year EU H2020 project started in… ▽ More The LEGaTO project leverages task-based programming models to provide a software ecosystem for Made in-Europe heterogeneous hardware composed of CPUs, GPUs, FPGAs and dataflow engines. The aim is to attain one order of magnitude energy savings from the edge to the converged cloud/HPC, balanced with the security and resilience challenges. LEGaTO is an ongoing three-year EU H2020 project started in December 2017. △ Less

Submitted 1 December, 2019; originally announced December 2019.

Comments: 6 pages, 9 figures

arXiv:1905.00673 [pdf, other]

An Adaptive Performance-oriented Scheduler for Static and Dynamic Heterogeneity

Authors: Jing Chen, Pirah Noor Soomro, Mustafa Abduljabbar, Miquel Pericàs

Abstract: With the emergence of heterogeneous hardware paving the way for the post-Moore era, it is of high importance to adapt the runtime scheduling to the platform's heterogeneity. To enhance adaptive and responsive scheduling, we introduce a Performance Trace Table (PTT) into XiTAO, a framework for elastic scheduling of mixed-mode parallelism. The PTT is an extensible and dynamic lightweight manifest of… ▽ More With the emergence of heterogeneous hardware paving the way for the post-Moore era, it is of high importance to adapt the runtime scheduling to the platform's heterogeneity. To enhance adaptive and responsive scheduling, we introduce a Performance Trace Table (PTT) into XiTAO, a framework for elastic scheduling of mixed-mode parallelism. The PTT is an extensible and dynamic lightweight manifest of the per-core latency that can be used to guide the scheduling of both critical and non-critical tasks. By understanding the per-task latency, the PTT can infer task performance, intra-application interference as well as inter-application interference. We run random Direct Acyclic Graphs (DAGs) of different workload categories as a benchmark on NVIDIA Jetson TX2 chip, achieving up to 3.25x speedup over a standard work-stealing scheduler. To exemplify scheduling adaption to interference, we run DAGs with high parallelism and analyze the scheduler's response to interference from a background process on an Intel Haswell (2650v3) multicore workstation. We also showcase the XiTAO's scheduling performance by porting the VGG-16 image classification framework based on Convolutional Neural Networks (CNN). △ Less

Submitted 30 December, 2020; v1 submitted 2 May, 2019; originally announced May 2019.

Showing 1–5 of 5 results for author: Soomro, P N