Journal Home Online First Current Issue Archive For Authors Journal Information 中文版

Frontiers of Information Technology & Electronic Engineering >> 2022, Volume 23, Issue 1 doi: 10.1631/FITEE.2100331

Multi-agent deep reinforcement learning for end–edge orchestrated resource allocation in industrial wireless networks

Affiliation(s): State Key Laboratory of Robotics, Shenyang Institute of Automation, Chinese Academy of Sciences, Shenyang 110016, China; Key Laboratory of Networked Control Systems, Chinese Academy of Sciences, Shenyang 110016, China; Institutes for Robotics and Intelligent Manufacturing, Chinese Academy of Sciences, Shenyang 110169, China; University of Chinese Academy of Sciences, Beijing 100049, China; less

Received:2021-07-05 Accepted: 2022-01-24 Available online:2022-01-24

Next Previous


Edge artificial intelligence will empower the ever simple (IWNs) supporting complex and dynamic tasks by collaboratively exploiting the computation and communication resources of both machine-type devices (MTDs) and edge servers. In this paper, we propose a based resource allocation (MADRL-RA) algorithm for IWNs to support computation-intensive and -sensitive applications. First, we present the system model of IWNs, wherein each MTD is regarded as a self-learning agent. Then, we apply the Markov decision process to formulate a minimum system overhead problem with joint optimization of and . Next, we employ MADRL to defeat the explosive state space and learn an effective resource allocation policy with respect to computing decision, computation capacity, and transmission power. To break the time correlation of training data while accelerating the learning process of MADRL-RA, we design a weighted experience replay to store and sample experiences categorically. Furthermore, we propose a step-by-step -greedy method to balance exploitation and exploration. Finally, we verify the effectiveness of MADRL-RA by comparing it with some benchmark algorithms in many experiments, showing that MADRL-RA converges quickly and learns an effective resource allocation policy achieving the minimum system overhead.

Related Research