DRL-based Multi-Stream Scheduling of Inference Pipelines on Edge Devices
Document Type
Conference Article
Publication Title
Proceedings of the IEEE International Conference on VLSI Design
Abstract
Real-time scheduling of multiple neural network-based inference pipelines on Graphics Processing Unit (GPU) based edge devices is an active area of research nowadays. Applications like Advanced Driver-Assistance Systems (ADAS) execute multiple such inference pipelines to make informed decisions on driving scenarios. The real-time performance of ADAS is often limited by platform resource limitations and thus incurs execution latency which ultimately leads to deadline violation. In this regard, modern GPUs provide support for concurrent execution of multiple compute streams. However, there is a lack of scheduling strategies in the literature that consider multiple such compute streams and focus on the concurrent execution of inference pipelines for more efficient real-time scheduling. In this paper, we address this issue by proposing a Deep Reinforcement Learning (DRL) based solution for multi-stream scheduling of inference pipelines on edge GPUs. Using DRL, we learn how to map every layer of the target inference pipelines to high or low-priority streams, while satisfying task-level deadline requirements. The experimental evaluation shows the efficacy of the proposed approach as compared to some baseline approaches.
First Page
324
Last Page
329
DOI
10.1109/VLSID60093.2024.00060
Publication Date
1-1-2024
Recommended Citation
Pereria, Danny; Ghosh, Sumana; and Dey, Soumyajit, "DRL-based Multi-Stream Scheduling of Inference Pipelines on Edge Devices" (2024). Conference Articles. 855.
https://digitalcommons.isical.ac.in/conf-articles/855