Resource Type

Journal Article 540

Conference Information 45

Conference Videos 25

Conference Topics 1

Year

2024 1

2023 79

2022 79

2021 84

2020 74

2019 77

2018 32

2017 37

2016 12

2015 15

2014 9

2013 4

2012 9

2011 4

2010 6

2009 8

2008 4

2007 12

2006 7

2005 8

open ︾

Keywords

Machine learning 42

Deep learning 34

Artificial intelligence 14

Reinforcement learning 14

Active learning 4

Process intensification 4

Additive manufacturing 3

Bayesian optimization 3

COVID-19 3

Coronavirus disease 2019 3

Traditional Chinese medicine 3

Adaptive dynamic programming 2

Attention 2

Autonomous driving 2

Autonomous learning 2

Big data 2

Chemical engineering 2

Collaborative filtering 2

Data-driven 2

open ︾

Search scope:

排序: Display mode:

A self-supervised method for treatment recommendation in sepsis Research Articles

Sihan Zhu, Jian Pu,jianpu@fudan.edu.cn

Frontiers of Information Technology & Electronic Engineering 2021, Volume 22, Issue 7,   Pages 926-939 doi: 10.1631/FITEE.2000127

Abstract: treatment is a highly challenging effort to reduce mortality in hospital intensive care units since the treatment response may vary for each patient. Tailored s are desired to assist doctors in making decisions efficiently and accurately. In this work, we apply a self-supervised method based on (RL) for on individuals. An uncertainty evaluation method is proposed to separate patient samples into two domains according to their responses to treatments and the state value of the chosen policy. Examples of two domains are then reconstructed with an auxiliary transfer learning task. A distillation method of privilege learning is tied to a variational auto-encoder framework for the transfer learning task between the low- and high-quality domains. Combined with the self-supervised way for better state and action representations, we propose a deep RL method called high-risk uncertainty (HRU) control to provide flexibility on the trade-off between the effectiveness and accuracy of ambiguous samples and to reduce the expected mortality. Experiments on the large-scale publicly available real-world dataset MIMIC-III demonstrate that our model reduces the estimated mortality rate by up to 2.3% in total, and that the estimated mortality rate in the majority of cases is reduced to 9.5%.

Keywords: 治疗推荐;脓毒症;自监督学习;强化学习;电子病历    

Pre-training with asynchronous supervised learning for reinforcement learning based autonomous driving Research Articles

Yunpeng Wang, Kunxian Zheng, Daxin Tian, Xuting Duan, Jianshan Zhou,ypwang@buaa.edu.cn,zhengkunxian@buaa.edu.cn,dtian@buaa.edu.cn,duanxuting@buaa.edu.cn

Frontiers of Information Technology & Electronic Engineering 2021, Volume 22, Issue 5,   Pages 615-766 doi: 10.1631/FITEE.1900637

Abstract: Rule-based autonomous driving systems may suffer from increased complexity with large-scale inter-coupled rules, so many researchers are exploring learning-based approaches. (RL) has been applied in designing autonomous driving systems because of its outstanding performance on a wide variety of sequential control problems. However, poor initial performance is a major challenge to the practical implementation of an RL-based autonomous driving system. RL training requires extensive training data before the model achieves reasonable performance, making an RL-based model inapplicable in a real-world setting, particularly when data are expensive. We propose an asynchronous (ASL) method for the RL-based end-to-end autonomous driving model to address the problem of poor initial performance before training this RL-based model in real-world settings. Specifically, prior knowledge is introduced in the ASL pre-training stage by asynchronously executing multiple processes in parallel, on multiple driving demonstration data sets. After pre-training, the model is deployed on a real vehicle to be further trained by RL to adapt to the real environment and continuously break the performance limit. The presented pre-training method is evaluated on the race car simulator, TORCS (The Open Racing Car Simulator), to verify that it can be sufficiently reliable in improving the initial performance and convergence speed of an end-to-end autonomous driving model in the RL training stage. In addition, a real-vehicle verification system is built to verify the feasibility of the proposed pre-training method in a real-vehicle deployment. Simulations results show that using some demonstrations during a supervised pre-training stage allows significant improvements in initial performance and convergence speed in the RL training stage.

Keywords: 自主驾驶;自动驾驶车辆;强化学习;监督学习    

Self-supervised graph learning with target-adaptive masking for session-based recommendation Research Article

Yitong WANG, Fei CAI, Zhiqiang PAN, Chengyu SONG,wangyitong20@nudt.edu.cn,caifei08@nudt.edu.cn,panzhiqiang@nudt.edu.cn,songchengyu@nudt.edu.cn

Frontiers of Information Technology & Electronic Engineering 2023, Volume 24, Issue 1,   Pages 73-87 doi: 10.1631/FITEE.2200137

Abstract: aims to predict the next item based on a user's limited interactions within a short period. Existing approaches use mainly recurrent neural networks (RNNs) or (GNNs) to model the sequential patterns or the transition relationships between items. However, such models either ignore the over-smoothing issue of GNNs, or directly use cross-entropy loss with a softmax layer for model optimization, which easily results in the over-fitting problem. To tackle the above issues, we propose a self-supervised graph learning with (SGL-TM) method. Specifically, we first construct a global graph based on all involved sessions and subsequently capture the self-supervised signals from the global connections between items, which helps supervise the model in generating accurate representations of items in the ongoing session. After that, we calculate the main supervised loss by comparing the ground truth with the predicted scores of items adjusted by our designed module. Finally, we combine the main supervised component with the auxiliary self-supervision module to obtain the final loss for optimizing the model parameters. Extensive experimental results from two benchmark datasets, Gowalla and Diginetica, indicate that SGL-TM can outperform state-of-the-art baselines in terms of Recall@20 and MRR@20, especially in short sessions.

Keywords: Session-based recommendation     Self-supervised learning     Graph neural networks     Target-adaptive masking    

Learning to select pseudo labels: a semi-supervised method for named entity recognition Research Articles

Zhen-zhen Li, Da-wei Feng, Dong-sheng Li, Xi-cheng Lu,lizhenzhen14@nudt.edu.cn,davyfeng.c@gmail.com,dsli@nudt.edu.cn,xclu@nudt.edu.cn

Frontiers of Information Technology & Electronic Engineering 2020, Volume 21, Issue 6,   Pages 809-962 doi: 10.1631/FITEE.1800743

Abstract: models have achieved state-of-the-art performance in (NER); the good performance, however, relies heavily on substantial amounts of labeled data. In some specific areas such as medical, financial, and military domains, labeled data is very scarce, while is readily available. Previous studies have used to enrich word representations, but a large amount of entity information in is neglected, which may be beneficial to the NER task. In this study, we propose a for NER tasks, which learns to create high-quality labeled data by applying a pre-trained module to filter out erroneous pseudo labels. Pseudo labels are automatically generated for and used as if they were true labels. Our semi-supervised framework includes three steps: constructing an optimal single neural model for a specific NER task, learning a module that evaluates pseudo labels, and creating new labeled data and improving the NER model iteratively. Experimental results on two English NER tasks and one Chinese clinical NER task demonstrate that our method further improves the performance of the best single neural model. Even when we use only pre-trained static word embeddings and do not rely on any external knowledge, our method achieves comparable performance to those state-of-the-art models on the CoNLL-2003 and OntoNotes 5.0 English NER tasks.

Keywords: 命名实体识别;无标注数据;深度学习;半监督学习方法    

Interactive medical image segmentation with self-adaptive confidence calibration

沈楚云,李文浩,徐琪森,胡斌,金博,蔡海滨,朱凤平,李郁欣,王祥丰

Frontiers of Information Technology & Electronic Engineering 2023, Volume 24, Issue 9,   Pages 1332-1348 doi: 10.1631/FITEE.2200299

Abstract: Interactive medical image segmentation based on human-in-the-loop machine learning is a novel paradigm that draws on human expert knowledge to assist medical image segmentation. However, existing methods often fall into what we call interactive misunderstanding, the essence of which is the dilemma in trading off short- and long-term interaction information. To better use the interaction information at various timescales, we propose an interactive segmentation framework, called interactive MEdical image segmentation with self-adaptive Confidence CAlibration (MECCA), which combines action-based confidence learning and multi-agent reinforcement learning. A novel confidence network is learned by predicting the alignment level of the action with short-term interaction information. A confidence-based reward-shaping mechanism is then proposed to explicitly incorporate confidence in the policy gradient calculation, thus directly correcting the model’s interactive misunderstanding. MECCA also enables user-friendly interactions by reducing the interaction intensity and difficulty via label generation and interaction guidance, respectively. Numerical experiments on different segmentation tasks show that MECCA can significantly improve short- and long-term interaction information utilization efficiency with remarkably fewer labeled samples. The demo video is available at https://bit.ly/mecca-demo-video.

Keywords: Medical image segmentation     Interactive segmentation     Multi-agent reinforcement learning     Confidence learning     Semi-supervised learning    

NGAT: attention in breadth and depth exploration for semi-supervised graph representation learning Research Articles

Jianke HU, Yin ZHANG,yinzh@zju.edu.cn

Frontiers of Information Technology & Electronic Engineering 2022, Volume 23, Issue 3,   Pages 409-421 doi: 10.1631/FITEE.2000657

Abstract: Recently, graph neural networks (GNNs) have achieved remarkable performance in representation learning on graph-structured data. However, as the number of network layers increases, GNNs based on the neighborhood aggregation strategy deteriorate due to the problem of oversmoothing, which is the major bottleneck for applying GNNs to real-world graphs. Many efforts have been made to improve the process of feature information aggregation from directly connected nodes, i.e., breadth exploration. However, these models perform the best only in the case of three or fewer layers, and the performance drops rapidly for deep layers. To alleviate oversmoothing, we propose a nested graph network (NGAT), which can work in a semi-supervised manner. In addition to breadth exploration, a -layer NGAT uses a layer-wise aggregation strategy guided by the mechanism to selectively leverage feature information from the -order neighborhood, i.e., depth exploration. Even with a 10-layer or deeper architecture, NGAT can balance the need for preserving the locality (including root node features and the local structure) and aggregating the information from a large neighborhood. In a number of experiments on standard tasks, NGAT outperforms other novel models and achieves state-of-the-art performance.

Keywords: Graph learning     Semi-supervised learning     Node classification     Attention    

Federated unsupervised representation learning Research Article

Fengda ZHANG, Kun KUANG, Long CHEN, Zhaoyang YOU, Tao SHEN, Jun XIAO, Yin ZHANG, Chao WU, Fei WU, Yueting ZHUANG, Xiaolin LI,fdzhang@zju.edu.cn,kunkuang@zju.edu.cn

Frontiers of Information Technology & Electronic Engineering 2023, Volume 24, Issue 8,   Pages 1181-1193 doi: 10.1631/FITEE.2200268

Abstract: To leverage the enormous amount of unlabeled data on distributed edge devices, we formulate a new problem in called federated unsupervised (FURL) to learn a common representation model without supervision while preserving data privacy. FURL poses two new challenges: (1) data distribution shift (non-independent and identically distributed, non-IID) among clients would make local models focus on different categories, leading to the inconsistency of representation spaces; (2) without unified information among the clients in FURL, the representations across clients would be misaligned. To address these challenges, we propose the federated contrastive averaging with dictionary and alignment (FedCA) algorithm. FedCA is composed of two key modules: a dictionary module to aggregate the representations of samples from each client which can be shared with all clients for consistency of representation space and an alignment module to align the representation of each client on a base model trained on public data. We adopt the contrastive approach for local model training. Through extensive experiments with three evaluation protocols in IID and non-IID settings, we demonstrate that FedCA outperforms all baselines with significant margins.

Keywords: Federated learning     Unsupervised learning     Representation learning     Contrastive learning    

Representation learning via a semi-supervised stacked distance autoencoder for image classification Research Articles

Liang Hou, Xiao-yi Luo, Zi-yang Wang, Jun Liang,jliang@zju.edu.cn

Frontiers of Information Technology & Electronic Engineering 2020, Volume 21, Issue 7,   Pages 963-1118 doi: 10.1631/FITEE.1900116

Abstract: is an important application of deep learning. In a typical classification task, the classification accuracy is strongly related to the features that are extracted via deep learning methods. An is a special type of , often used for dimensionality reduction and feature extraction. The proposed method is based on the traditional , incorporating the “distance” information between samples from different categories. The model is called a semi-supervised distance . Each layer is first pre-trained in an unsupervised manner. In the subsequent supervised training, the optimized parameters are set as the initial values. To obtain more suitable features, we use a stacked model to replace the basic structure with a single hidden layer. A series of experiments are carried out to test the performance of different models on several datasets, including the MNIST dataset, street view house numbers (SVHN) dataset, German traffic sign recognition benchmark (GTSRB), and CIFAR-10 dataset. The proposed semi-supervised distance method is compared with the traditional , sparse , and supervised . Experimental results verify the effectiveness of the proposed model.

Keywords: 自动编码器;图像分类;半监督学习;神经网络    

Correspondence: Uncertainty-aware complementary label queries for active learning Perspective

Shengyuan LIU, Ke CHEN, Tianlei HU, Yunqing MAO,liushengyuan@zju.edu.cn,chenk@cs.zju.edu.cn,htl@zju.edu.cn,myq@citycloud.com.cn

Frontiers of Information Technology & Electronic Engineering 2023, Volume 24, Issue 10,   Pages 1497-1503 doi: 10.1631/FITEE.2200589

Abstract: Many active learning methods assume that a learner can simply ask for the full annotations of some training data from annotators. These methods mainly try to cut the annotation costs by minimizing the number of annotation actions. Unfortunately, annotating instances exactly in many real-world classification tasks is still expensive. To reduce the cost of a single annotation action, we try to tackle a novel active learning setting, named active learning with complementary labels (ALCL). ALCL learners ask only yes/no questions in some classes. After receiving answers from annotators, ALCL learners obtain a few supervised instances and more training instances with complementary labels, which specify only one of the classes to which the pattern does not belong. There are two challenging issues in ALCL: one is how to sample instances to be queried, and the other is how to learn from these complementary labels and ordinary accurate labels. For the first issue, we propose an uncertainty-based sampling strategy under this novel setup. For the second issue, we upgrade a previous ALCL method to fit our sampling strategy. Experimental results on various datasets demonstrate the superiority of our approaches.

Keywords: 主动学习;图片分类;弱监督学习    

MDLB: a metadata dynamic load balancing mechanism based on reinforcement learning Research Articles

Zhao-qi Wu, Jin Wei, Fan Zhang, Wei Guo, Guang-wei Xie,17034203@qq.com

Frontiers of Information Technology & Electronic Engineering 2020, Volume 21, Issue 7,   Pages 963-1118 doi: 10.1631/FITEE.1900121

Abstract: With the growing amount of information and data, s have been widely used in many applications, including the Google File System, Amazon S3, Hadoop Distributed File System, and Ceph, in which load balancing of plays an important role in improving the input/output performance of the entire system. Unbalanced load on the server leads to a serious bottleneck problem for system performance. However, most existing load balancing strategies, which are based on subtree segmentation or hashing, lack good dynamics and adaptability. In this study, we propose a (MDLB) mechanism based on (RL). We learn that the algorithm and our RL-based strategy consist of three modules, i.e., the policy selection network, load balancing network, and parameter update network. Experimental results show that the proposed MDLB algorithm can adjust the load dynamically according to the performance of the servers, and that it has good adaptability in the case of sudden change of data volume.

Keywords: 面向对象的存储系统;元数据;动态负载均衡;强化学习;Q_learning    

Actor–Critic Reinforcement Learning and Application in Developing Computer-Vision-Based Interface Tracking Article

Oguzhan Dogru, Kirubakaran Velswamy, Biao Huang

Engineering 2021, Volume 7, Issue 9,   Pages 1248-1261 doi: 10.1016/j.eng.2021.04.027

Abstract:

This paper synchronizes control theory with computer vision by formalizing object tracking as a sequential decision-making process. A reinforcement learning (RL) agent successfully tracks an interface between two liquids, which is often a critical variable to track in many chemical, petrochemical, metallurgical, and oil industries. This method utilizes less than 100 images for creating an environment, from which the agent generates its own data without the need for expert knowledge. Unlike supervised learning (SL) methods that rely on a huge number of parameters, this approach requires far fewer parameters, which naturally reduces its maintenance cost. Besides its frugal nature, the agent is robust to environmental uncertainties such as occlusion, intensity changes, and excessive noise. From a closed-loop control context, an interface location-based deviation is chosen as the optimization goal during training. The methodology showcases RL for real-time object-tracking applications in the oil sands industry. Along with a presentation of the interface tracking problem, this paper provides a detailed review of one of the most effective RL methodologies: actor–critic policy.

Keywords: Interface tracking     Object tracking     Occlusion     Reinforcement learning     Uniform manifold approximation and projection    

Non-IID Recommender Systems: A Review and Framework of Recommendation Paradigm Shifting Artical

Longbing Cao

Engineering 2016, Volume 2, Issue 2,   Pages 212-224 doi: 10.1016/J.ENG.2016.02.013

Abstract:

While recommendation plays an increasingly critical role in our living, study, work, and entertainment, the recommendations we receive are often for irrelevant, duplicate, or uninteresting products and services. A critical reason for such bad recommendations lies in the intrinsic assumption that recommended users and items are independent and identically distributed (IID) in existing theories and systems. Another phenomenon is that, while tremendous efforts have been made to model specific aspects of users or items, the overall user and item characteristics and their non-IIDness have been overlooked. In this paper, the non-IID nature and characteristics of recommendation are discussed, followed by the non-IID theoretical framework in order to build a deep and comprehensive understanding of the intrinsic nature of recommendation problems, from the perspective of both couplings and heterogeneity. This non-IID recommendation research triggers the paradigm shift from IID to non-IID recommendation research and can hopefully deliver informed, relevant, personalized, and actionable recommendations. It creates exciting new directions and fundamental solutions to address various complexities including cold-start, sparse data-based, cross-domain, group-based, and shilling attack-related issues.

Keywords: Independent and identically distributed (IID)     Non-IID     Heterogeneity     Coupling relationship     Coupling learning     Relational learning     IIDness learning     Non-IIDness learning     Recommender system     Recommendation     Non-IID recommendation    

Decentralized multi-agent reinforcement learning with networked agents: recent advances Review Article

Kaiqing Zhang, Zhuoran Yang, Tamer Başar,kzhang66@illinois.edu,zy6@princeton.edu,basar1@illinois.edu

Frontiers of Information Technology & Electronic Engineering 2021, Volume 22, Issue 6,   Pages 802-814 doi: 10.1631/FITEE.1900661

Abstract: Multi-agent (MARL) has long been a significant research topic in both machine learning and control systems. Recent development of (single-agent) deep has created a resurgence of interest in developing new MARL algorithms, especially those founded on theoretical analysis. In this paper, we review recent advances on a sub-area of this topic: decentralized MARL with networked agents. In this scenario, multiple agents perform sequential decision-making in a common environment, and without the coordination of any central controller, while being allowed to exchange information with their neighbors over a communication network. Such a setting finds broad applications in the control and operation of robots, unmanned vehicles, mobile sensor networks, and the smart grid. This review covers several of our research endeavors in this direction, as well as progress made by other researchers along the line. We hope that this review promotes additional research efforts in this exciting yet challenging area.

Keywords: 强化学习;多智能体系统;网络系统;一致性优化;分布式优化;博弈论    

Embedding expert demonstrations into clustering buffer for effective deep reinforcement learning Research Article

Shihmin WANG, Binqi ZHAO, Zhengfeng ZHANG, Junping ZHANG, Jian PU

Frontiers of Information Technology & Electronic Engineering 2023, Volume 24, Issue 11,   Pages 1541-1556 doi: 10.1631/FITEE.2300084

Abstract: As one of the most fundamental topics in (RL), is essential to the deployment of deep RL algorithms. Unlike most existing exploration methods that sample an action from different types of posterior distributions, we focus on the policy and propose an efficient selective sampling approach to improve by modeling the internal hierarchy of the environment. Specifically, we first employ in the policy to generate an action candidate set. Then we introduce a clustering buffer for modeling the internal hierarchy, which consists of on-policy data, off-policy data, and expert data to evaluate actions from the clusters in the action candidate set in the exploration stage. In this way, our approach is able to take advantage of the supervision information in the expert demonstration data. Experiments on six different continuous locomotion environments demonstrate superior performance and faster convergence of selective sampling. In particular, on the LGSVL task, our method can reduce the number of convergence steps by 46.7% and the convergence time by 28.5%. Furthermore, our code is open-source for reproducibility. The code is available at https://github.com/Shihwin/SelectiveSampling.

Keywords: Reinforcement learning     Sample efficiency     Sampling process     Clustering methods     Autonomous driving    

A home energy management approach using decoupling value and policy in reinforcement learning

熊珞琳,唐漾,刘臣胜,毛帅,孟科,董朝阳,钱锋

Frontiers of Information Technology & Electronic Engineering 2023, Volume 24, Issue 9,   Pages 1261-1272 doi: 10.1631/FITEE.2200667

Abstract: Considering the popularity of electric vehicles and the flexibility of household appliances, it is feasible to dispatch energy in home energy systems under dynamic electricity prices to optimize electricity cost and comfort residents. In this paper, a novel home energy management (HEM) approach is proposed based on a data-driven deep reinforcement learning method. First, to reveal the multiple uncertain factors affecting the charging behavior of electric vehicles (EVs), an improved mathematical model integrating driver’s experience, unexpected events, and traffic conditions is introduced to describe the dynamic energy demand of EVs in home energy systems. Second, a decoupled advantage actor-critic (DA2C) algorithm is presented to enhance the energy optimization performance by alleviating the overfitting problem caused by the shared policy and value networks. Furthermore, separate networks for the policy and value functions ensure the generalization of the proposed method in unseen scenarios. Finally, comprehensive experiments are carried out to compare the proposed approach with existing methods, and the results show that the proposed method can optimize electricity cost and consider the residential comfort level in different scenarios.

Keywords: Home energy system     Electric vehicle     Reinforcement learning     Generalization    

Title Author Date Type Operation

A self-supervised method for treatment recommendation in sepsis

Sihan Zhu, Jian Pu,jianpu@fudan.edu.cn

Journal Article

Pre-training with asynchronous supervised learning for reinforcement learning based autonomous driving

Yunpeng Wang, Kunxian Zheng, Daxin Tian, Xuting Duan, Jianshan Zhou,ypwang@buaa.edu.cn,zhengkunxian@buaa.edu.cn,dtian@buaa.edu.cn,duanxuting@buaa.edu.cn

Journal Article

Self-supervised graph learning with target-adaptive masking for session-based recommendation

Yitong WANG, Fei CAI, Zhiqiang PAN, Chengyu SONG,wangyitong20@nudt.edu.cn,caifei08@nudt.edu.cn,panzhiqiang@nudt.edu.cn,songchengyu@nudt.edu.cn

Journal Article

Learning to select pseudo labels: a semi-supervised method for named entity recognition

Zhen-zhen Li, Da-wei Feng, Dong-sheng Li, Xi-cheng Lu,lizhenzhen14@nudt.edu.cn,davyfeng.c@gmail.com,dsli@nudt.edu.cn,xclu@nudt.edu.cn

Journal Article

Interactive medical image segmentation with self-adaptive confidence calibration

沈楚云,李文浩,徐琪森,胡斌,金博,蔡海滨,朱凤平,李郁欣,王祥丰

Journal Article

NGAT: attention in breadth and depth exploration for semi-supervised graph representation learning

Jianke HU, Yin ZHANG,yinzh@zju.edu.cn

Journal Article

Federated unsupervised representation learning

Fengda ZHANG, Kun KUANG, Long CHEN, Zhaoyang YOU, Tao SHEN, Jun XIAO, Yin ZHANG, Chao WU, Fei WU, Yueting ZHUANG, Xiaolin LI,fdzhang@zju.edu.cn,kunkuang@zju.edu.cn

Journal Article

Representation learning via a semi-supervised stacked distance autoencoder for image classification

Liang Hou, Xiao-yi Luo, Zi-yang Wang, Jun Liang,jliang@zju.edu.cn

Journal Article

Correspondence: Uncertainty-aware complementary label queries for active learning

Shengyuan LIU, Ke CHEN, Tianlei HU, Yunqing MAO,liushengyuan@zju.edu.cn,chenk@cs.zju.edu.cn,htl@zju.edu.cn,myq@citycloud.com.cn

Journal Article

MDLB: a metadata dynamic load balancing mechanism based on reinforcement learning

Zhao-qi Wu, Jin Wei, Fan Zhang, Wei Guo, Guang-wei Xie,17034203@qq.com

Journal Article

Actor–Critic Reinforcement Learning and Application in Developing Computer-Vision-Based Interface Tracking

Oguzhan Dogru, Kirubakaran Velswamy, Biao Huang

Journal Article

Non-IID Recommender Systems: A Review and Framework of Recommendation Paradigm Shifting

Longbing Cao

Journal Article

Decentralized multi-agent reinforcement learning with networked agents: recent advances

Kaiqing Zhang, Zhuoran Yang, Tamer Başar,kzhang66@illinois.edu,zy6@princeton.edu,basar1@illinois.edu

Journal Article

Embedding expert demonstrations into clustering buffer for effective deep reinforcement learning

Shihmin WANG, Binqi ZHAO, Zhengfeng ZHANG, Junping ZHANG, Jian PU

Journal Article

A home energy management approach using decoupling value and policy in reinforcement learning

熊珞琳,唐漾,刘臣胜,毛帅,孟科,董朝阳,钱锋

Journal Article