Search | Engineering

Subscribe Submit

Home Journals Focus Achievement Fronts About Us 中文版

Resource Type

Journal Article 343

Year

2024 1

2023 58

2022 49

2021 46

2020 37

2019 34

2018 24

2017 22

2016 8

2015 6

2014 3

2013 3

2012 2

2011 2

2010 4

2009 3

2008 4

2007 5

2006 5

2005 3

open ︾

Keywords

Machine learning 42

Deep learning 34

Artificial intelligence 14

Reinforcement learning 14

Active learning 4

Process intensification 4

Bayesian optimization 3

Big data 3

Adaptive dynamic programming 2

Additive manufacturing 2

Attention 2

Autonomous driving 2

Autonomous learning 2

CCUS 2

COVID-19 2

Chemical absorption 2

Chemical engineering 2

Chemical looping 2

Data-driven 2

open ︾

Search scope:

排序： Display mode:

Pre-training with asynchronous supervised learning for reinforcement learning based autonomous driving Research Articles

Yunpeng Wang, Kunxian Zheng, Daxin Tian, Xuting Duan, Jianshan Zhou,ypwang@buaa.edu.cn,zhengkunxian@buaa.edu.cn,dtian@buaa.edu.cn,duanxuting@buaa.edu.cn

Frontiers of Information Technology & Electronic Engineering 2021, Volume 22, Issue 5, Pages 615-766 doi: 10.1631/FITEE.1900637

Abstract: Rule-based autonomous driving systems may suffer from increased complexity with large-scale inter-coupled rules, so many researchers are exploring learning-based approaches. (RL) has been applied in designing autonomous driving systems because of its outstanding performance on a wide variety of sequential control problems. However, poor initial performance is a major challenge to the practical implementation of an RL-based autonomous driving system. RL training requires extensive training data before the model achieves reasonable performance, making an RL-based model inapplicable in a real-world setting, particularly when data are expensive. We propose an asynchronous (ASL) method for the RL-based end-to-end autonomous driving model to address the problem of poor initial performance before training this RL-based model in real-world settings. Specifically, prior knowledge is introduced in the ASL pre-training stage by asynchronously executing multiple processes in parallel, on multiple driving demonstration data sets. After pre-training, the model is deployed on a real vehicle to be further trained by RL to adapt to the real environment and continuously break the performance limit. The presented pre-training method is evaluated on the race car simulator, TORCS (The Open Racing Car Simulator), to verify that it can be sufficiently reliable in improving the initial performance and convergence speed of an end-to-end autonomous driving model in the RL training stage. In addition, a real-vehicle verification system is built to verify the feasibility of the proposed pre-training method in a real-vehicle deployment. Simulations results show that using some demonstrations during a supervised pre-training stage allows significant improvements in initial performance and convergence speed in the RL training stage.

Keywords：自主驾驶；自动驾驶车辆；强化学习；监督学习

HTML PDF Collect

MDLB: a metadata dynamic load balancing mechanism based on reinforcement learning Research Articles

Zhao-qi Wu, Jin Wei, Fan Zhang, Wei Guo, Guang-wei Xie,17034203@qq.com

Frontiers of Information Technology & Electronic Engineering 2020, Volume 21, Issue 7, Pages 963-1118 doi: 10.1631/FITEE.1900121

Abstract: With the growing amount of information and data, s have been widely used in many applications, including the Google File System, Amazon S3, Hadoop Distributed File System, and Ceph, in which load balancing of plays an important role in improving the input/output performance of the entire system. Unbalanced load on the server leads to a serious bottleneck problem for system performance. However, most existing load balancing strategies, which are based on subtree segmentation or hashing, lack good dynamics and adaptability. In this study, we propose a (MDLB) mechanism based on (RL). We learn that the algorithm and our RL-based strategy consist of three modules, i.e., the policy selection network, load balancing network, and parameter update network. Experimental results show that the proposed MDLB algorithm can adjust the load dynamically according to the performance of the servers, and that it has good adaptability in the case of sudden change of data volume.

Keywords：面向对象的存储系统；元数据；动态负载均衡；强化学习；Q_learning

HTML PDF Collect

Decentralized multi-agent reinforcement learning with networked agents: recent advances Review Article

Kaiqing Zhang, Zhuoran Yang, Tamer Başar,kzhang66@illinois.edu,zy6@princeton.edu,basar1@illinois.edu

Frontiers of Information Technology & Electronic Engineering 2021, Volume 22, Issue 6, Pages 802-814 doi: 10.1631/FITEE.1900661

Abstract: Multi-agent (MARL) has long been a significant research topic in both machine learning and control systems. Recent development of (single-agent) deep has created a resurgence of interest in developing new MARL algorithms, especially those founded on theoretical analysis. In this paper, we review recent advances on a sub-area of this topic: decentralized MARL with networked agents. In this scenario, multiple agents perform sequential decision-making in a common environment, and without the coordination of any central controller, while being allowed to exchange information with their neighbors over a communication network. Such a setting finds broad applications in the control and operation of robots, unmanned vehicles, mobile sensor networks, and the smart grid. This review covers several of our research endeavors in this direction, as well as progress made by other researchers along the line. We hope that this review promotes additional research efforts in this exciting yet challenging area.

Keywords：强化学习；多智能体系统；网络系统；一致性优化；分布式优化；博弈论

HTML PDF Collect

Actor–Critic Reinforcement Learning and Application in Developing Computer-Vision-Based Interface Tracking Article

Oguzhan Dogru, Kirubakaran Velswamy, Biao Huang

Engineering 2021, Volume 7, Issue 9, Pages 1248-1261 doi: 10.1016/j.eng.2021.04.027

Abstract:

This paper synchronizes control theory with computer vision by formalizing object tracking as a sequential decision-making process. A reinforcement learning (RL) agent successfully tracks an interface between two liquids, which is often a critical variable to track in many chemical, petrochemical, metallurgical, and oil industries. This method utilizes less than 100 images for creating an environment, from which the agent generates its own data without the need for expert knowledge. Unlike supervised learning (SL) methods that rely on a huge number of parameters, this approach requires far fewer parameters, which naturally reduces its maintenance cost. Besides its frugal nature, the agent is robust to environmental uncertainties such as occlusion, intensity changes, and excessive noise. From a closed-loop control context, an interface location-based deviation is chosen as the optimization goal during training. The methodology showcases RL for real-time object-tracking applications in the oil sands industry. Along with a presentation of the interface tracking problem, this paper provides a detailed review of one of the most effective RL methodologies: actor–critic policy.

Keywords： Interface tracking Object tracking Occlusion Reinforcement learning Uniform manifold approximation and projection

HTML PDF Collect

A home energy management approach using decoupling value and policy in reinforcement learning

熊珞琳,唐漾,刘臣胜,毛帅,孟科,董朝阳,钱锋

Frontiers of Information Technology & Electronic Engineering 2023, Volume 24, Issue 9, Pages 1261-1272 doi: 10.1631/FITEE.2200667

Abstract: Considering the popularity of electric vehicles and the flexibility of household appliances, it is feasible to dispatch energy in home energy systems under dynamic electricity prices to optimize electricity cost and comfort residents. In this paper, a novel home energy management (HEM) approach is proposed based on a data-driven deep reinforcement learning method. First, to reveal the multiple uncertain factors affecting the charging behavior of electric vehicles (EVs), an improved mathematical model integrating driver’s experience, unexpected events, and traffic conditions is introduced to describe the dynamic energy demand of EVs in home energy systems. Second, a decoupled advantage actor-critic (DA2C) algorithm is presented to enhance the energy optimization performance by alleviating the overfitting problem caused by the shared policy and value networks. Furthermore, separate networks for the policy and value functions ensure the generalization of the proposed method in unseen scenarios. Finally, comprehensive experiments are carried out to compare the proposed approach with existing methods, and the results show that the proposed method can optimize electricity cost and consider the residential comfort level in different scenarios.

Keywords： Home energy system Electric vehicle Reinforcement learning Generalization

HTML Collect

Embedding expert demonstrations into clustering buffer for effective deep reinforcement learning Research Article

Shihmin WANG, Binqi ZHAO, Zhengfeng ZHANG, Junping ZHANG, Jian PU

Frontiers of Information Technology & Electronic Engineering 2023, Volume 24, Issue 11, Pages 1541-1556 doi: 10.1631/FITEE.2300084

Abstract: As one of the most fundamental topics in (RL), is essential to the deployment of deep RL algorithms. Unlike most existing exploration methods that sample an action from different types of posterior distributions, we focus on the policy and propose an efficient selective sampling approach to improve by modeling the internal hierarchy of the environment. Specifically, we first employ in the policy to generate an action candidate set. Then we introduce a clustering buffer for modeling the internal hierarchy, which consists of on-policy data, off-policy data, and expert data to evaluate actions from the clusters in the action candidate set in the exploration stage. In this way, our approach is able to take advantage of the supervision information in the expert demonstration data. Experiments on six different continuous locomotion environments demonstrate superior performance and faster convergence of selective sampling. In particular, on the LGSVL task, our method can reduce the number of convergence steps by 46.7% and the convergence time by 28.5%. Furthermore, our code is open-source for reproducibility. The code is available at https://github.com/Shihwin/SelectiveSampling.

Keywords： Reinforcement learning Sample efficiency Sampling process Clustering methods Autonomous driving

HTML PDF Collect

A self-supervised method for treatment recommendation in sepsis Research Articles

Sihan Zhu, Jian Pu,jianpu@fudan.edu.cn

Frontiers of Information Technology & Electronic Engineering 2021, Volume 22, Issue 7, Pages 926-939 doi: 10.1631/FITEE.2000127

Abstract: treatment is a highly challenging effort to reduce mortality in hospital intensive care units since the treatment response may vary for each patient. Tailored s are desired to assist doctors in making decisions efficiently and accurately. In this work, we apply a self-supervised method based on (RL) for on individuals. An uncertainty evaluation method is proposed to separate patient samples into two domains according to their responses to treatments and the state value of the chosen policy. Examples of two domains are then reconstructed with an auxiliary transfer learning task. A distillation method of privilege learning is tied to a variational auto-encoder framework for the transfer learning task between the low- and high-quality domains. Combined with the self-supervised way for better state and action representations, we propose a deep RL method called high-risk uncertainty (HRU) control to provide flexibility on the trade-off between the effectiveness and accuracy of ambiguous samples and to reduce the expected mortality. Experiments on the large-scale publicly available real-world dataset MIMIC-III demonstrate that our model reduces the estimated mortality rate by up to 2.3% in total, and that the estimated mortality rate in the majority of cases is reduced to 9.5%.

Keywords：治疗推荐；脓毒症；自监督学习；强化学习；电子病历

HTML PDF Collect

Toward Human-in-the-loop AI: Enhancing Deep Reinforcement Learning Via Real-time Human Guidance for Autonomous Driving Article

Jingda Wu, Zhiyu Huang, Zhongxu Hu, Chen Lv

Engineering 2023, Volume 21, Issue 2, Pages 75-91 doi: 10.1016/j.eng.2022.05.017

Abstract:

Due to its limited intelligence and abilities, machine learning is currently unable to handle various situations thus cannot completely replace humans in real-world applications. Because humans exhibit robustness and adaptability in complex scenarios, it is crucial to introduce humans into the training loop of artificial intelligence (AI), leveraging human intelligence to further advance machine learning algorithms. In this study, a real-time human-guidance-based (Hug)-deep reinforcement learning (DRL) method is developed for policy training in an end-to-end autonomous driving case. With our newly designed mechanism for control transfer between humans and automation, humans are able to intervene and correct the agent's unreasonable actions in real time when necessary during the model training process. Based on this human-in-the-loop guidance mechanism, an improved actor-critic architecture with modified policy and value networks is developed. The fast convergence of the proposed Hug-DRL allows real-time human guidance actions to be fused into the agent's training loop, further improving the efficiency and performance of DRL. The developed method is validated by human-in-the-loop experiments with 40 subjects and compared with other state-of-the-art learning approaches. The results suggest that the proposed method can effectively enhance the training efficiency and performance of the DRL algorithm under human guidance without imposing specific requirements on participants' expertise or experience.

Keywords： Human-in-the-loop AI Deep reinforcement learning Human guidance Autonomous driving

HTML PDF Collect

Cooperative channel assignment for VANETs based on multiagent reinforcement learning Research Articles

Yun-peng Wang, Kun-xian Zheng, Da-xin Tian, Xu-ting Duan, Jian-shan Zhou,ypwang@buaa.edu.cn,zhengkunxian@buaa.edu.cn,dtian@buaa.edu.cn,duanxuting@buaa.edu.cn

Frontiers of Information Technology & Electronic Engineering 2020, Volume 21, Issue 7, Pages 1047-1058 doi: 10.1631/FITEE.1900308

Abstract: (DCA) plays a key role in extending vehicular ad-hoc network capacity and mitigating congestion. However, channel assignment under vehicular direct communication scenarios faces mutual influence of large-scale nodes, the lack of centralized coordination, unknown global state information, and other challenges. To solve this problem, a multiagent (RL) based cooperative DCA (RL-CDCA) mechanism is proposed. Specifically, each vehicular node can successfully learn the proper strategies of channel selection and backoff adaptation from the real-time channel state information (CSI) using two cooperative RL models. In addition, neural networks are constructed as nonlinear Q-function approximators, which facilitates the mapping of the continuously sensed input to the mixed policy output. Nodes are driven to locally share and incorporate their individual rewards such that they can optimize their policies in a distributed collaborative manner. Simulation results show that the proposed multiagent RL-CDCA can better reduce the one-hop packet delay by no less than 73.73%, improve the packet delivery ratio by no less than 12.66% on average in a highly dense situation, and improve the fairness of the global network resource allocation.

Keywords： Vehicular ad-hoc networks Reinforcement learning Dynamic channel assignment Multichannel

HTML PDF Collect

Stochastic pedestrian avoidance for autonomous vehicles using hybrid reinforcement learning Research Article

Huiqian LI, Jin HUANG, Zhong CAO, Diange YANG, Zhihua ZHONG,lihq20@mails.tsinghua.edu.cn,huangjin@tsinghua.edu.cn,caoc15@mails.tsinghua.edu.cn,ydg@tsinghua.edu.cn

Frontiers of Information Technology & Electronic Engineering 2023, Volume 24, Issue 1, Pages 131-140 doi: 10.1631/FITEE.2200128

Abstract: Ensuring the safety of s is essential and challenging when are involved. Classical avoidance strategies cannot handle uncertainty, and learning-based methods lack performance guarantees. In this paper we propose a (HRL) approach for to safely interact with s behaving uncertainly. The method integrates the rule-based strategy and reinforcement learning strategy. The confidence of both strategies is evaluated using the data recorded in the training process. Then we design an activation function to select the final policy with higher confidence. In this way, we can guarantee that the final policy performance is not worse than that of the rule-based policy. To demonstrate the effectiveness of the proposed method, we validate it in simulation using an accelerated testing technique to generate stochastic s. The results indicate that it increases the success rate for avoidance to 98.8%, compared with 94.4% of the baseline method.

Keywords： Pedestrian Hybrid reinforcement learning Autonomous vehicles Decision-making

HTML PDF Collect

Coach-assisted multi-agent reinforcement learning framework for unexpected crashed agents Research Article

Jian ZHAO, Youpeng ZHAO, Weixun WANG, Mingyu YANG, Xunhan HU, Wengang ZHOU, Jianye HAO, Houqiang LI

Frontiers of Information Technology & Electronic Engineering 2022, Volume 23, Issue 7, Pages 1032-1042 doi: 10.1631/FITEE.2100594

Abstract: Multi-agent is difficult to apply in practice, partially because of the gap between simulated and real-world scenarios. One reason for the gap is that simulated systems always assume that agents can work normally all the time, while in practice, one or more agents may unexpectedly "crash" during the coordination process due to inevitable hardware or software failures. Such crashes destroy the cooperation among agents and lead to performance degradation. In this work, we present a formal conceptualization of a cooperative multi-agent system with unexpected crashes. To enhance the robustness of the system to crashes, we propose a coach-assisted multi-agent framework that introduces a virtual coach agent to adjust the crash rate during training. We have designed three coaching strategies (fixed crash rate, curriculum learning, and adaptive crash rate) and a re-sampling strategy for our coach agent. To our knowledge, this work is the first to study unexpected crashes in a . Extensive experiments on grid-world and StarCraft II micromanagement tasks demonstrate the efficacy of the adaptive strategy compared with the fixed crash rate strategy and curriculum learning strategy. The ablation study further illustrates the effectiveness of our re-sampling strategy.

Keywords： Multi-agent system Reinforcement learning Unexpected crashed agents

HTML PDF Collect

Behavioral control task supervisor with memory based on reinforcement learning for human–multi-robot coordination systems Research Article

Jie HUANG, Zhibin MO, Zhenyi ZHANG, Yutao CHEN,yutao.chen@fzu.edu.cn

Frontiers of Information Technology & Electronic Engineering 2022, Volume 23, Issue 8, Pages 1174-1188 doi: 10.1631/FITEE.2100280

Abstract: In this study, a novel (RLTS) with memory in a behavioral control framework is proposed for ; (HMRCSs). Existing HMRCSs suffer from high decision-making time cost and large task tracking errors caused by repeated human intervention, which restricts the autonomy of multi-robot systems (MRSs). Moreover, existing s in the (NSBC) framework need to formulate many priority-switching rules manually, which makes it difficult to realize an optimal behavioral priority adjustment strategy in the case of multiple robots and multiple tasks. The proposed RLTS with memory provides a detailed integration of the deep Q-network (DQN) and long short-term memory (LSTM) within the NSBC framework, to achieve an optimal behavioral priority adjustment strategy in the presence of task conflict and to reduce the frequency of human intervention. Specifically, the proposed RLTS with memory begins by memorizing human intervention history when the robot systems are not confident in emergencies, and then reloads the history information when encountering the same situation that has been tackled by humans previously. Simulation results demonstrate the effectiveness of the proposed RLTS. Finally, an experiment using a group of mobile robots subject to external noise and disturbances validates the effectiveness of the proposed RLTS with memory in uncertain real-world environments.

Keywords： Human– multi-robot coordination systems Null-space-based behavioral control Task supervisor Reinforcement learning Knowledge base

HTML PDF Collect

Optimal Bidding and Operation of a Power Plant with Solvent-Based Carbon Capture under a CO₂ Allowance Market: A Solution with a Reinforcement Learning-Based Sarsa Temporal-Difference Algorithm

Ziang Li,Zhengtao Ding,Meihong Wang

Engineering 2017, Volume 3, Issue 2, Pages 257-265 doi: 10.1016/J.ENG.2017.02.014

Abstract:

In this paper, a reinforcement learning (RL)-based Sarsa temporal-difference (TD) algorithm is applied to search for a unified bidding and operation strategy for a coal-fired power plant with monoethanolamine (MEA)-based post-combustion carbon capture under different carbon dioxide (CO₂) allowance market conditions. The objective of the decision maker for the power plant is to maximize the discounted cumulative profit during the power plant lifetime. Two constraints are considered for the objective formulation. Firstly, the tradeoff between the energy-intensive carbon capture and the electricity generation should be made under presumed fixed fuel consumption. Secondly, the CO₂ allowances purchased from the CO₂ allowance market should be approximately equal to the quantity of CO₂ emission from power generation. Three case studies are demonstrated thereafter. In the first case, we show the convergence of the Sarsa TD algorithm and find a deterministic optimal bidding and operation strategy. In the second case, compared with the independently designed operation and bidding strategies discussed in most of the relevant literature, the Sarsa TD-based unified bidding and operation strategy with time-varying flexible market-oriented CO₂ capture levels is demonstrated to help the power plant decision maker gain a higher discounted cumulative profit. In the third case, a competitor operating another power plant identical to the preceding plant is considered under the same CO₂ allowance market. The competitor also has carbon capture facilities but applies a different strategy to earn profits. The discounted cumulative profits of the two power plants are then compared, thus exhibiting the competitiveness of the power plant that is using the unified bidding and operation strategy explored by the Sarsa TD algorithm.

Keywords： Power plants Post-combustion carbon capture Chemical absorption CO₂ allowance market Optimal decision-making Reinforcement learning

HTML PDF Collect

Multi-agent deep reinforcement learning for end–edge orchestrated resource allocation in industrial wireless networks Research Article

Xiaoyu LIU, Chi XU, Haibin YU, Peng ZENG,liuxiaoyu1@sia.cn,xuchi@sia.cn,yhb@sia.cn,zp@sia.cn

Frontiers of Information Technology & Electronic Engineering 2022, Volume 23, Issue 1, Pages 47-60 doi: 10.1631/FITEE.2100331

Abstract: Edge artificial intelligence will empower the ever simple (IWNs) supporting complex and dynamic tasks by collaboratively exploiting the computation and communication resources of both machine-type devices (MTDs) and edge servers. In this paper, we propose a based resource allocation (MADRL-RA) algorithm for IWNs to support computation-intensive and -sensitive applications. First, we present the system model of IWNs, wherein each MTD is regarded as a self-learning agent. Then, we apply the Markov decision process to formulate a minimum system overhead problem with joint optimization of and . Next, we employ MADRL to defeat the explosive state space and learn an effective resource allocation policy with respect to computing decision, computation capacity, and transmission power. To break the time correlation of training data while accelerating the learning process of MADRL-RA, we design a weighted experience replay to store and sample experiences categorically. Furthermore, we propose a step-by-step -greedy method to balance exploitation and exploration. Finally, we verify the effectiveness of MADRL-RA by comparing it with some benchmark algorithms in many experiments, showing that MADRL-RA converges quickly and learns an effective resource allocation policy achieving the minimum system overhead.

Keywords： Multi-agent deep reinforcement learning End–edge orchestrated Industrial wireless networks Delay Energy consumption

HTML PDF Collect

Proximal policy optimization with an integral compensator for quadrotor control Research

Huan Hu, Qing-ling Wang,qlwang@seu.edu.cn

Frontiers of Information Technology & Electronic Engineering 2020, Volume 21, Issue 5, Pages 649-808 doi: 10.1631/FITEE.1900641

Abstract: We use the advanced (PPO) algorithm to optimize the stochastic control strategy to achieve speed control of the “model-free” quadrotor. The model is controlled by four learned s, which directly map the system states to control commands in an end-to-end style. By introducing an integral compensator into the actor-critic framework, the speed tracking accuracy and robustness have been greatly enhanced. In addition, a two-phase learning scheme which includes both offline- and online-learning is developed for practical use. A model with strong generalization ability is learned in the offline phase. Then, the flight policy of the model is continuously optimized in the online learning phase. Finally, the performances of our proposed algorithm are compared with those of the traditional PID algorithm.

Keywords：强化学习；近端策略优化；四旋翼控制；神经网络

HTML PDF Collect

Title Author Date Type Operation

Pre-training with asynchronous supervised learning for reinforcement learning based autonomous driving

Yunpeng Wang, Kunxian Zheng, Daxin Tian, Xuting Duan, Jianshan Zhou,ypwang@buaa.edu.cn,zhengkunxian@buaa.edu.cn,dtian@buaa.edu.cn,duanxuting@buaa.edu.cn

Journal Article

MDLB: a metadata dynamic load balancing mechanism based on reinforcement learning

Zhao-qi Wu, Jin Wei, Fan Zhang, Wei Guo, Guang-wei Xie,17034203@qq.com

Journal Article

Decentralized multi-agent reinforcement learning with networked agents: recent advances

Kaiqing Zhang, Zhuoran Yang, Tamer Başar,kzhang66@illinois.edu,zy6@princeton.edu,basar1@illinois.edu

Journal Article

Actor–Critic Reinforcement Learning and Application in Developing Computer-Vision-Based Interface Tracking

Oguzhan Dogru, Kirubakaran Velswamy, Biao Huang

Journal Article

A home energy management approach using decoupling value and policy in reinforcement learning

熊珞琳,唐漾,刘臣胜,毛帅,孟科,董朝阳,钱锋

Journal Article

Embedding expert demonstrations into clustering buffer for effective deep reinforcement learning

Shihmin WANG, Binqi ZHAO, Zhengfeng ZHANG, Junping ZHANG, Jian PU

Journal Article

A self-supervised method for treatment recommendation in sepsis

Sihan Zhu, Jian Pu,jianpu@fudan.edu.cn

Journal Article

Toward Human-in-the-loop AI: Enhancing Deep Reinforcement Learning Via Real-time Human Guidance for Autonomous Driving

Jingda Wu, Zhiyu Huang, Zhongxu Hu, Chen Lv

Journal Article

Cooperative channel assignment for VANETs based on multiagent reinforcement learning

Yun-peng Wang, Kun-xian Zheng, Da-xin Tian, Xu-ting Duan, Jian-shan Zhou,ypwang@buaa.edu.cn,zhengkunxian@buaa.edu.cn,dtian@buaa.edu.cn,duanxuting@buaa.edu.cn

Journal Article

Stochastic pedestrian avoidance for autonomous vehicles using hybrid reinforcement learning

Huiqian LI, Jin HUANG, Zhong CAO, Diange YANG, Zhihua ZHONG,lihq20@mails.tsinghua.edu.cn,huangjin@tsinghua.edu.cn,caoc15@mails.tsinghua.edu.cn,ydg@tsinghua.edu.cn

Journal Article

Coach-assisted multi-agent reinforcement learning framework for unexpected crashed agents

Jian ZHAO, Youpeng ZHAO, Weixun WANG, Mingyu YANG, Xunhan HU, Wengang ZHOU, Jianye HAO, Houqiang LI

Journal Article

Behavioral control task supervisor with memory based on reinforcement learning for human–multi-robot coordination systems

Jie HUANG, Zhibin MO, Zhenyi ZHANG, Yutao CHEN,yutao.chen@fzu.edu.cn

Journal Article

Optimal Bidding and Operation of a Power Plant with Solvent-Based Carbon Capture under a CO₂ Allowance Market: A Solution with a Reinforcement Learning-Based Sarsa Temporal-Difference Algorithm

Ziang Li,Zhengtao Ding,Meihong Wang

Journal Article

Multi-agent deep reinforcement learning for end–edge orchestrated resource allocation in industrial wireless networks

Xiaoyu LIU, Chi XU, Haibin YU, Peng ZENG,liuxiaoyu1@sia.cn,xuchi@sia.cn,yhb@sia.cn,zp@sia.cn

Journal Article

Proximal policy optimization with an integral compensator for quadrotor control

Huan Hu, Qing-ling Wang,qlwang@seu.edu.cn

Journal Article