Search | Engineering

订阅投稿

首页工程期刊工程焦点工程成就工程前沿关于我们 English

资源类型

期刊论文 409

会议视频 6

年份

2024 43

2023 77

2022 75

2021 49

2020 41

2019 27

2018 19

2017 23

2016 9

2015 11

2014 2

2013 3

2012 1

2011 4

2010 2

2009 3

2008 3

2007 5

2006 1

2005 2

展开︾

关键词

机器学习 29

深度学习 18

人工智能 13

代理模型 2

分类 2

增材制造 2

强化学习 2

材料设计 2

深度强化学习 2

结构健康监测 2

自动驾驶 2

贝叶斯优化 2

高层建筑 2

2D—3D配准 1

3D打印 1

6G 1

ARM 1

CAN总线 1

CCUS 1

展开︾

检索范围：

排序：展示方式：

A new automatic convolutional neural network based on deep reinforcement learning for fault diagnosis

《机械工程前沿（英文）》 2022年第17卷第2期 doi: 10.1007/s11465-022-0673-7

摘要： Convolutional neural network (CNN) has achieved remarkable applications in fault diagnosis. However, the tuning aiming at obtaining the well-trained CNN model is mainly manual search. Tuning requires considerable experiences on the knowledge on CNN training and fault diagnosis, and is always time consuming and labor intensive, making the automatic hyper parameter optimization (HPO) of CNN models essential. To solve this problem, this paper proposes a novel automatic CNN (ACNN) for fault diagnosis, which can automatically tune its three key hyper parameters, namely, learning rate, batch size, and L2-regulation. First, a new deep reinforcement learning (DRL) is developed, and it constructs an agent aiming at controlling these three hyper parameters along with the training of CNN models online. Second, a new structure of DRL is designed by combining deep deterministic policy gradient and long short-term memory, which takes the training loss of CNN models as its input and can output the adjustment on these three hyper parameters. Third, a new training method for ACNN is designed to enhance its stability. Two famous bearing datasets are selected to evaluate the performance of ACNN. It is compared with four commonly used HPO methods, namely, random search, Bayesian optimization, tree Parzen estimator, and sequential model-based algorithm configuration. ACNN is also compared with other published machine learning (ML) and deep learning (DL) methods. The results show that ACNN outperforms these HPO and ML/DL methods, validating its potential in fault diagnosis.

关键词： deep reinforcement learning hyper parameter optimization convolutional neural network fault diagnosis

HTML PDF 收藏

Automated synthesis of steady-state continuous processes using reinforcement learning

《化学科学与工程前沿（英文）》 2022年第16卷第2期页码 288-302 doi: 10.1007/s11705-021-2055-9

摘要： Automated flowsheet synthesis is an important field in computer-aided process engineering. The present work demonstrates how reinforcement learning can be used for automated flowsheet synthesis without any heuristics or prior knowledge of conceptual design. The environment consists of a steady-state flowsheet simulator that contains all physical knowledge. An agent is trained to take discrete actions and sequentially build up flowsheets that solve a given process problem. A novel method named SynGameZero is developed to ensure good exploration schemes in the complex problem. Therein, flowsheet synthesis is modelled as a game of two competing players. The agent plays this game against itself during training and consists of an artificial neural network and a tree search for forward planning. The method is applied successfully to a reaction-distillation process in a quaternary system.

关键词： automated process synthesis flowsheet synthesis artificial intelligence machine learning reinforcement learning

HTML PDF 收藏

Deep reinforcement learning-based critical element identification and demolition planning of frame structures

Shaojun ZHU; Makoto OHSAKI; Kazuki HAYASHI; Shaohan ZONG; Xiaonong GUO

《结构与土木工程前沿（英文）》 2022年第16卷第11期页码 1397-1414 doi: 10.1007/s11709-022-0860-y

摘要： This paper proposes a framework for critical element identification and demolition planning of frame structures. Innovative quantitative indices considering the severity of the ultimate collapse scenario are proposed using reinforcement learning and graph embedding. The action is defined as removing an element, and the state is described by integrating the joint and element features into a comprehensive feature vector for each element. By establishing the policy network, the agent outputs the Q value for each action after observing the state. Through numerical examples, it is confirmed that the trained agent can provide an accurate estimation of the Q values, and handle problems with different action spaces owing to utilization of graph embedding. Besides, different behaviors can be learned by varying hyperparameters in the reward function. By comparing the proposed method and the conventional sensitivity index-based methods, it is demonstrated that the computational cost is considerably reduced because the reinforcement learning model is trained offline. Besides, it is proved that the Q values produced by the reinforcement learning agent can make up for the deficiencies of existing indices, and can be directly used as the quantitative index for the decision-making for determining the most expected collapse scenario, i.e., the sequence of element removals.

关键词： progressive collapse alternate load path demolition planning reinforcement learning graph embedding

HTML PDF 收藏

Recent development on statistical methods for personalized medicine discovery

null

《医学前沿（英文）》 2013年第7卷第1期页码 102-110 doi: 10.1007/s11684-013-0245-7

摘要：

It is well documented that patients can show significant heterogeneous responses to treatments so the best treatment strategies may require adaptation over individuals and time. Recently, a number of new statistical methods have been developed to tackle the important problem of estimating personalized treatment rules using single-stage or multiple-stage clinical data. In this paper, we provide an overview of these methods and list a number of challenges.

关键词： dynamic treatment regimes personalized medicine reinforcement learning Q-learning

HTML PDF 收藏

面向可信自动驾驶决策——一种具有安全保证的鲁棒强化学习方法

何祥坤, 黄文辉, 吕辰

《工程（英文）》 2024年第33卷第2期页码 77-89 doi: 10.1016/j.eng.2023.10.005

摘要：

While autonomous vehicles are vital components of intelligent transportation systems, ensuring the trustworthiness of decision-making remains a substantial challenge in realizing autonomous driving. Therefore, we present a novel robust reinforcement learning approach with safety guarantees to attain trustworthy decision-making for autonomous vehicles. The proposed technique ensures decision trustworthiness in terms of policy robustness and collision safety. Specifically, an adversary model is learned online to simulate the worst-case uncertainty by approximating the optimal adversarial perturbations on the observed states and environmental dynamics. In addition, an adversarial robust actor-critic algorithm is developed to enable the agent to learn robust policies against perturbations in observations and dynamics. Moreover, we devise a safety mask to guarantee the collision safety of the autonomous driving agent during both the training and testing processes using an interpretable knowledge model known as the Responsibility-Sensitive Safety Model. Finally, the proposed approach is evaluated through both simulations and experiments. These results indicate that the autonomous driving agent can make trustworthy decisions and drastically reduce the number of collisions through robust safety policies.

关键词： Autonomous vehicle Decision-making Reinforcement learning Adversarial attack Safety guarantee

HTML PDF 收藏

Actor-Critic强化学习算法及其在开发基于计算机视觉的界面跟踪中的应用 Article

Oguzhan Dogru, Kirubakaran Velswamy, 黄彪

《工程（英文）》 2021年第7卷第9期页码 1248-1261 doi: 10.1016/j.eng.2021.04.027

摘要：

本文通过将对象跟踪形式化为序列决策过程，使控制理论与计算机视觉实现同步。强化学习（RL）智能体成功跟踪了两种液体之间的界面，这通常是化学、石化、冶金和石油行业中跟踪的关键变量。该方法使用少于100 张图像来创建环境，智能体无需专家知识即可从中生成自己的数据。与依赖大量参数的监督学习（SL）方法不同，这种方法需要的参数少得多，这自然降低了维护成本。除了经济性外，该智能体还对环境不确定性（如遮挡、强度变化和过度噪声）具有鲁棒性。在闭环控制情境下，基于界面位置的偏差被选作训练阶段的优化目标。该方法展示了RL方法在油砂行业中的实时对象跟踪应用。本文除了介绍界面跟踪问题外，还详细回顾了最有效的RL方法之一——actor-critic策略。

关键词：界面跟踪对象跟踪遮挡强化学习均匀流形逼近和投影

HTML PDF 收藏

基于专家示教聚类经验池的高效深度强化学习 Research Article

王士珉1,赵彬琦1,张政锋1,张军平1,浦剑2

《信息与电子工程前沿（英文）》 2023年第24卷第11期页码 1541-1556 doi: 10.1631/FITEE.2300084

摘要：作为强化学习领域最基本的主题之一，样本效率对于深度强化学习算法的部署至关重要。与现有大多数从不同类型的后验分布中对动作进行采样的探索方法不同，我们专注于策略的采样过程，提出一种有效的选择性采样方法，通过对环境的内部层次结构建模来提高样本效率。具体来说，首先在策略采样过程中使用聚类方法生成动作候选集，随后引入一个用于对内部层次结构建模的聚类缓冲区，它由同轨数据、异轨数据以及专家数据组成，用于评估探索阶段动作候选集中不同类别动作的价值。通过这种方式，我们的方法能够更多地利用专家示教数据中的监督信息。在6种不同的连续运动环境中进行了实验，结果表明选择性采样方法具有卓越的强化学习性能和更快的收敛速度。特别地，在LGSVL任务中，该方法可以减少46.7%的收敛步数和28.5%的收敛时间。代码已开源，见https://github.com/Shihwin/SelectiveSampling。

关键词：强化学习；采样效率；采样过程；聚类方法；自动驾驶

HTML PDF 收藏

基于解耦价值和策略强化学习的家庭能源管理方法

熊珞琳,唐漾,刘臣胜,毛帅,孟科,董朝阳,钱锋

《信息与电子工程前沿（英文）》 2023年第24卷第9期页码 1261-1272 doi: 10.1631/FITEE.2200667

摘要：由于电动汽车的普及性和家用电器的灵活性，在动态电价下对家庭能源系统进行能源调度优化电力成本和保障居民舒适度是可行的。本文提出一种基于数据驱动的深度强化学习家庭能源管理方法。首先，为揭示影响电动汽车充电行为的多种不确定因素，引入一种结合驾驶员经验、突发事件和交通状况的改进数学模型描述电动汽车在家庭能源系统中的动态能量需求。其次，提出一种解耦优势演员-评论家（DA2C）算法，通过缓解策略和价值共享网络导致的过拟合问题提升能源优化性能。此外，策略函数和价值函数的解耦网络确保了所提方法在不可见场景中的泛化性。最后，将所提方法与现有方法进行综合实验比较。结果表明，该方法能在不同场景下优化用电成本并兼顾居住舒适度。

关键词：家庭能源系统电动汽车强化学习泛化性

HTML 收藏

人在回路的深度强化学习算法及其在自动驾驶智能决策中的应用 Article

吴京达, 黄志宇, 胡中旭, 吕辰

《工程（英文）》 2023年第21卷第2期页码 75-91 doi: 10.1016/j.eng.2022.05.017

摘要：

由于机器学习智能和能力有限，它目前仍无法处理各种情况，因此不能在现实应用中完全取代人类。因为人类在复杂场景中表现出稳健性和适应性，所以将人类引入人工智能（AI）的训练回路并利用人类智能进一步提升机器学习算法变得至关重要。本研究开发了一种基于实时人类指导（Hug）的深度强化学习
（DRL）方法，用于端到端自动驾驶案例中的策略训练。通过新设计的人类与自动化之间的控制转移机制，人类能够在模型训练过程中实时干预和纠正智能体的不合理行为。基于这种人在回路的指导机制，本研究开发一种基于修正策略和价值网络的改良的演员-评论家架构（actor-critic architecture）。所提出的Hug-DRL的快速收敛允许实时的人类指导行为融合到智能体的训练回路中，进一步提高了DRL的效率和性能。本研究通过40 名受试者的人在回路实验对开发的方法进行了验证，并与其他最先进的学习方法进行了比较。结果表明，该方法可以在人类指导下有效地提高DRL算法的训练效率和性能，且不特定要求参与者的专业知识或经验。

关键词：人在回路AI 深度强化学习人类指导自动驾驶

HTML PDF 收藏

基于逆强化学习理论的自适应行车场景的拟人化避障轨迹规划研究 Article

武健, 闫扬, 刘玉龙, 刘亚辉

《工程（英文）》 2024年第33卷第2期页码 133-145 doi: 10.1016/j.eng.2023.07.018

摘要：

The forward design of trajectory planning strategies requires preset trajectory optimization functions, resulting in poor adaptability of the strategy and an inability to accurately generate obstacle avoidance trajectories that conform to real driver behavior habits. In addition, owing to the strong time-varying dynamic characteristics of obstacle avoidance scenarios, it is necessary to design numerous trajectory optimization functions and adjust the corresponding parameters. Therefore, an anthropomorphic obstacle-avoidance trajectory planning strategy for adaptive driving scenarios is proposed. First, numerous expert-demonstrated trajectories are extracted from the HighD natural driving dataset. Subsequently, a trajectory expectation feature-matching algorithm is proposed that uses maximum entropy inverse reinforcement learning theory to learn the extracted expert-demonstrated trajectories and achieve automatic acquisition of the optimization function of the expert-demonstrated trajectory. Furthermore, a mapping model is constructed by combining the key driving scenario information that affects vehicle obstacle avoidance with the weight of the optimization function, and an anthropomorphic obstacle avoidance trajectory planning strategy for adaptive driving scenarios is proposed. Finally, the proposed strategy is verified based on real driving scenarios. The results show that the strategy can adjust the weight distribution of the trajectory optimization function in real time according to the “emergency degree” of obstacle avoidance and the state of the vehicle. Moreover, this strategy can generate anthropomorphic trajectories that are similar to expert-demonstrated trajectories, effectively improving the adaptability and acceptability of trajectories in driving scenarios.

关键词： Obstacle avoidance trajectory planning Inverse reinforcement theory Anthropomorphic Adaptive driving scenarios

HTML PDF 收藏

针对意外崩溃智能体的教练辅助多智能体强化学习框架 Research Article

赵鉴1,赵有朋1,王维埙2,阳明宇1,胡迅晗1,周文罡1,郝建业2,李厚强1

《信息与电子工程前沿（英文）》 2022年第23卷第7期页码 1032-1042 doi: 10.1631/FITEE.2100594

摘要：多智能体强化学习在实际场景中很难应用，一部分原因在于模拟环境和现实环境之间存在差距。造成这种差距的一个原因是，模拟系统总是假设智能体可以一直正常工作，而实际上，由于不可避免的硬件或软件故障，一个或多个智能体可能会在合作过程中意外 “崩溃”。这样的崩溃会破坏智能体之间的合作，导致系统性能下降。本文中，我们给出了意外崩溃情况下合作多智能体强化学习系统的正式定义。为增强系统应对崩溃时的鲁棒性，提出教练辅助多智能体强化学习框架，其在训练过程中引入一个虚拟教练智能体，以调整系统的崩溃概率。为教练智能体设计了3种教练策略和重采样策略。据我们所知，这是研究多智能体系统中意外崩溃情况的首项工作。在网格环境和星际争霸微管理任务上的大量实验表明，相比固定崩溃概率和课程学习的教练策略，自适应策略更加有效。消融实验进一步展现了重采样策略的有效性。

关键词：多智能体系统；强化学习；意外崩溃智能体

HTML PDF 收藏

基于多智能体强化学习的车载自组织网络协作信道分配 Research Articles

王云鹏,郑坤贤,田大新,段续庭,周建山

《信息与电子工程前沿（英文）》 2020年第21卷第7期页码 1047-1058 doi: 10.1631/FITEE.1900308

摘要：动态信道分配（DCA）在扩展车载自组织网络容量和缓解其拥塞方面起着关键作用。然而，在车—车直连通信场景下，信道分配面临大规模节点相互影响、缺乏集中式协调、全局网络状态信息未知以及其他挑战。为解决该问题，提出一种基于多智能体强化学习（RL）的协作动态信道分配（RL-CDCA）机制。具体而言，每个车辆节点都可借助2个互相协作的RL模型，从实时信道状态信息中成功学习信道选择和信道接入自适应退避的正确策略。此外，将神经网络构造为非线性Q函数逼近器，有助于将感测到的连续输入值映射到混合策略输出。多智能体RL-CDCA驱动节点共享本地奖励并合并区域内其他节点各自的奖励，以便它们能够以分布式协作方式优化各自策略。仿真结果表明，与4种现有机制相比，所提多智能体RL-CDCA算法即便在路网车辆高度密集的情况下仍能将单跳数据包传输延迟减少不小于73.73％，将平均数据包递送成功率提高不小于12.66％，并更好地保证网络资源分配公平性。

关键词：车载自组织网络；强化学习；动态信道分配；多信道

HTML PDF 收藏

基于混合强化学习的自动驾驶汽车行人避撞方法 Research Article

李惠乾1,黄晋1,曹重1,杨殿阁1,钟志华2

《信息与电子工程前沿（英文）》 2023年第24卷第1期页码 131-140 doi: 10.1631/FITEE.2200128

摘要：确保行人的安全对自动驾驶汽车而言至关重要，同时也具有一定挑战。经典的行人避撞策略无法应对不确定性，而基于学习的方法缺乏明确的性能保障。本文提出一种基于混合强化学习的行人避撞方法，以使自动驾驶车辆能够与具有行为不确定性的行人安全交互。该方法集成了规则策略和强化学习策略，并设计了一个激活函数选择具有更高置信度的作为最终策略，通过这种方式保证最终策略的表现不亚于规则策略。为说明所提方法的有效性，本文使用一种加速测试方法生成了行为随机的行人进行仿真验证。结果表明，该方法在测试场景中的成功率，相比基准方法的94.4%，提升至98.8%。

关键词：行人；混合强化学习；自动驾驶汽车；决策

HTML PDF 收藏

面向人—多机器人协同系统的带记忆强化学习行为控制任务管理器 Research Article

黄捷1,2,3,莫智斌1,2,3,张祯毅1,2,3,陈宇韬1,2,3

《信息与电子工程前沿（英文）》 2022年第23卷第8期页码 1174-1188 doi: 10.1631/FITEE.2100280

摘要：针对人—多机器人协同系统提出一种基于行为控制框架的带记忆强化学习任务管理器（RLTS）。由于重复的人工干预，现有人—多机器人协同系统决策时间成本高、任务跟踪误差大，限制了多机器人系统的自主性。此外，基于零空间行为控制框架的任务管理器依赖手动制定优先级切换规则，难以在多机器人和多任务情况下实现最优行为优先级调整策略。提出一种带记忆强化学习任务管理器，基于零空间行为控制框架融合深度Q-网络和长短时记忆神经网络知识库，实现任务冲突时最优行为优先级调整策略以及降低人为干预频率。当机器人在紧急情况下置信度不足时，所提带记忆强化学习任务管理器会记忆人类干预历史，在遭遇相同人工干预情况时重新加载历史控制信号。仿真结果验证了该方法的有效性。最后，通过一组受外界噪声和干扰的移动机器人实验，验证了所提带记忆强化学习任务管理器在不确定现实环境中的有效性。

关键词：人—多机器人协同系统；基于零空间行为控制；任务管理器；强化学习；知识库

HTML PDF 收藏

碳配额市场下以乙醇胺溶液进行碳捕集的电厂的优化竞标和运行：基于强化学习的Sarsa时间差分算法的解决

李子昂, 王美宏, 丁正桃

《工程（英文）》 2017年第3卷第2期页码 257-265 doi: 10.1016/J.ENG.2017.02.014

摘要：

对于处在碳配额市场条件下以乙醇胺(MEA) 进行碳捕集的燃煤电厂，本文应用了基于强化学习的Sarsa 时间差分算法为其自行搜寻一种统一的竞标和运行策略。电厂的决策者的目的被定义为最大化电厂寿命下的贴现累计利润。其中，我们引入以下两个限制条件：一是碳捕集的高能耗和电力生产之间的权衡；二是碳排放交易市场中竞得的碳配额数量与电力生产导致的实际碳排放量的近似相等。本文给出了三个案例方便研究。第一个案例中，我们展示了Sarsa 算法将收敛到一个确定且优化的竞标和运行策略。第二个案例中，相互独立设计的运行和竞标策略与统一设计的运行和竞标策略相互比较，以表明加入了随时间变化、市场导向的碳捕集水平后，Sarsa 算法将有助于电厂决策者获得更高的贴现累计利润。第三个案例则引入了处在同一碳配额市场的另一电厂作为原电厂的竞争对手。两家电厂设置了相同的发电和二氧化碳捕集设备，但新电厂采用不同的策略获得利润。比较两家电厂的贴现累计利润，结果表明：采用Sarsa 学习算法、找到统一的竞标和运行策略的原电厂会更具竞争力。

关键词：电厂燃烧后碳捕集化学吸收碳配额市场决策优化强化学习