TEES: topology-aware execution environment service for fast and agile application deployment in HPC
国防科技大学计算机学院，中国长沙市，410073Received：2021-06-16 Accepted： 2022-10-26 Available online：2022-10-26
systems are about to reach a new height: exascale. is becoming an increasingly prominent problem. technology solves the problems of encapsulation and migration of applications and their . However, the image is too large, and deploying the image to a large number of compute nodes is time-consuming. Although the approach brings higher transmission efficiency, it introduces larger network load. All of these issues lead to high startup latency of the application. To solve these problems, we propose the topology-aware service (TEES) for fast and agile on HPC systems. TEES creates a more lightweight for users, and uses a more efficient topology-aware P2P approach to reduce deployment time. Combined with a split-step transport and launch-in-advance mechanism, TEES reduces application startup latency. In the Tianhe HPC system, TEES realizes the deployment and startup of a typical application on 17 560 compute nodes within 3 s. Compared to -based , the speed is increased by 12-fold, and the network load is reduced by 85%.