Journal Home Online First Current Issue Archive For Authors Journal Information 中文版

Frontiers of Information Technology & Electronic Engineering >> 2023, Volume 24, Issue 11 doi: 10.1631/FITEE.2300084

Embedding expert demonstrations into clustering buffer for effective deep reinforcement learning

Shanghai Key Laboratory of Intelligent Information Processing, School of Computer Science, Fudan University, Shanghai 200433, China; Institute of Science and Technology for Brain-inspired Intelligence, Fudan University, Shanghai 200433, China;

Received: 2023-02-12 Accepted: 2023-12-04 Available online: 2023-12-04

Next Previous

Abstract

As one of the most fundamental topics in (RL), is essential to the deployment of deep RL algorithms. Unlike most existing exploration methods that sample an action from different types of posterior distributions, we focus on the policy and propose an efficient selective sampling approach to improve by modeling the internal hierarchy of the environment. Specifically, we first employ in the policy to generate an action candidate set. Then we introduce a clustering buffer for modeling the internal hierarchy, which consists of on-policy data, off-policy data, and expert data to evaluate actions from the clusters in the action candidate set in the exploration stage. In this way, our approach is able to take advantage of the supervision information in the expert demonstration data. Experiments on six different continuous locomotion environments demonstrate superior performance and faster convergence of selective sampling. In particular, on the LGSVL task, our method can reduce the number of convergence steps by 46.7% and the convergence time by 28.5%. Furthermore, our code is open-source for reproducibility. The code is available at https://github.com/Shihwin/SelectiveSampling.

Related Research