Journal Home Online First Current Issue Archive For Authors Journal Information 中文版

Frontiers of Information Technology & Electronic Engineering >> 2023, Volume 24, Issue 1 doi: 10.1631/FITEE.2100298

Resource scheduling techniques in cloud from a view of coordination: a holistic survey

Affiliation(s): School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China; Center of Heterogeneous Intelligent Computer Architecture and Systems, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China; less

Received: 2021-06-24 Accepted: 2023-01-21 Available online: 2023-01-21

Next Previous

Abstract

Nowadays, the management of resource contention in shared cloud remains a pending problem. The evolution and deployment of new application paradigms (e.g., deep learning training and s) and custom hardware (e.g., graphics processing unit (GPU) and tensor processing unit (TPU)) have posed new challenges in resource management system design. Current solutions tend to trade cluster efficiency for guaranteed application performance, e.g., resource over-allocation, leaving a lot of resources underutilized. Overcoming this dilemma is not easy, because different components across the software stack are involved. Nevertheless, massive efforts have been devoted to seeking effective performance isolation and highly efficient resource scheduling. The goal of this paper is to systematically cover related aspects to deliver the techniques from the perspective, and to identify the corresponding trends they indicate. Briefly, four topics are involved. First, isolation mechanisms deployed at different levels (micro-architecture, system, and virtualization levels) are reviewed, including GPU multitasking methods. Second, within an individual machine and at the cluster level are investigated, respectively. Particularly, GPU scheduling for deep learning applications is described in detail. Third, adaptive resource management including the latest -related research is thoroughly explored. Finally, future research directions are discussed in the light of advanced work. We hope that this review paper will help researchers establish a global view of the landscape of resource management techniques in shared cloud, and see technology trends more clearly.

Related Research