基于深度强化的考虑阻塞约束的分布式混合流水车间多目标调度方法研究

孙雪颜; 沈卫明; 范家昕; Birgit Vogel-Heuserb; Fandi Bib; 张春江

doi:10.1016/j.eng.2024.11.033

PDF(2373 KB)

工程（英文） ›› 2025, Vol. 46 ›› Issue (3) : 278-291. DOI: 10.1016/j.eng.2024.11.033

研究论文

Article

基于深度强化的考虑阻塞约束的分布式混合流水车间多目标调度方法研究

孙雪颜 ^a ,
沈卫明 ^a^,^* ,
范家昕 ^a^,^* ,
Birgit Vogel-Heuserb ^b ,
Fandi Bib ^b ,
张春江 ^a

作者信息 +

Deep Reinforcement Learning-based Multi-Objective Scheduling for Distributed Heterogeneous Hybrid Flow Shops with Blocking Constraints

Author information +

History +

Abstract

This paper investigates a distributed heterogeneous hybrid blocking flow-shop scheduling problem (DHHBFSP) designed to minimize the total tardiness and total energy consumption simultaneously, and proposes an improved proximal policy optimization (IPPO) method to make real-time decisions for the DHHBFSP. A multi-objective Markov decision process is modeled for the DHHBFSP, where the reward function is represented by a vector with dynamic weights instead of the common objective-related scalar value. A factory agent (FA) is formulated for each factory to select unscheduled jobs and is trained by the proposed IPPO to improve the decision quality. Multiple FAs work asynchronously to allocate jobs that arrive randomly at the shop. A two-stage training strategy is introduced in the IPPO, which learns from both single- and dual-policy data for better data utilization. The proposed IPPO is tested on randomly generated instances and compared with variants of the basic proximal policy optimization (PPO), dispatch rules, multi-objective metaheuristics, and multi-agent reinforcement learning methods. Extensive experimental results suggest that the proposed strategies offer significant improvements to the basic PPO, and the proposed IPPO outperforms the state-of-the-art scheduling methods in both convergence and solution quality.

Keywords

Multi-objective Markov decision process / Multi-agent deep reinforcement learning / Proximal policy optimization / Distributed hybrid flow-shop scheduling / Blocking constraints

引用本文

EndNote

Ris (Procite)

Bibtex

导出引用

孙雪颜, 沈卫明, 范家昕. 基于深度强化的考虑阻塞约束的分布式混合流水车间多目标调度方法研究. Engineering. 2025, 46(3): 278-291 https://doi.org/10.1016/j.eng.2024.11.033

参考文献

原文顺序 | 文献年度倒序 | 文中引用次数倒序

[1]	Gao L, Shen W, Li X.New trends in intelligent manufacturing.Engineering 2019; 5(4):619-620.
[2]	Han W, Guo F, Su X.A reinforcement learning method for a hybrid flow-shop scheduling problem.Algorithms 2019; 12(11):222.
[3]	Martinez S, Dauz Sère-Pérès, Gu Céret, Mati Y, Sauer N.Complexity of flowshop scheduling problems with a new blocking constraint.Eur J Oper Res 2006; 169(3):855-864.
[4]	Srai JS, Kumar M, Graham G, Phillips W, Tooze J, Ford S, et al.Distributed manufacturing: scope, challenges and opportunities.Int J Prod Res 2016; 54(23):6917-6935.
[5]	Shao Z, Pi D, Shao W.Hybrid enhanced discrete fruit fly optimization algorithm for scheduling blocking flow-shop in distributed environment.Expert Syst Appl 2020; 145:113147.
[6]	Qin HX, Han YY, Liu YP, Li JQ, Pan QK, Han X.A collaborative iterative greedy algorithm for the scheduling of distributed heterogeneous hybrid flow shop with blocking constraints.Expert Syst Appl 2022; 201:117256.
[7]	Qian F.Smart process manufacturing toward carbon neutrality: digital transformation in process manufacturing for achieving the goals of carbon peak and carbon neutrality.Engineering 2023; 27(8):1-2.
[8]	Wang R, Jiang L, Wang YD, Roskilly AP.Energy saving technologies and mass-thermal network optimization for decarbonized iron and steel industry: a review.J Clean Prod 2020; 274:122997.
[9]	He K, Wang L.A review of energy use and energy-efficient technologies for the iron and steel industry.Renew Sustain Energy Rev 2017; 70:1022-1039.
[10]	Hernandez AG, Paoli L, Cullen JM.How resource-efficient is the global steel industry?.Resour Conserv Recycling 2018; 133:132-145.
[11]	Gao Z, Geng Y, Wu R, Chen W, Wu F, Tian X.Analysis of energy-related CO₂ emissions in China’s pharmaceutical industry and its driving forces.J Clean Prod 2019; 223:94-108.
[12]	Ribas I, Companys R, Tort-Martorell X.Efficient heuristics for the parallel blocking flow shop scheduling problem.Expert Syst Appl 2017; 74:41-54.
[13]	Shao Z, Shao W, Pi D.Effective heuristics and metaheuristics for the distributed fuzzy blocking flow-shop scheduling problem.Swarm Evol Comput 2020; 59:100747.
[14]	Riedmiller S, Riedmiller M.A neural reinforcement learning approach to learn local dispatching policies in production scheduling.In: Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence; 1999 Jul 31–Aug 6; Stockholm, Sweden. San Francisco: Morgan Kaufmann Publishers Inc.; 1999. p. 764–71.
[15]	Zhang G, Xing K, Cao F.Discrete differential evolution algorithm for distributed blocking flowshop scheduling with makespan criterion.Eng Appl Artif Intell 2018; 76:96-107.
[16]	Chen S, Pan QK, Gao L, Sang HY.A population-based iterated greedy algorithm to minimize total flowtime for the distributed blocking flowshop scheduling problem.Eng Appl Artif Intell 2021; 104:104375.
[17]	Shao Z, Shao W, Pi D.LS-HH: a learning-based selection hyper-heuristic for distributed heterogeneous hybrid blocking flow-shop scheduling.IEEE Trans Emerg Top Comput Intell 2023; 7(1):111-127.
[18]	Zinn J, Ockier P, Vogel-Heuser B.Deep Q-learning for the control of special-purpose automated production systems.In: Proceedings of the 2020 IEEE 16th International Conference on Automation Science and Engineering (CASE); 2020 Aug 20–21; Hong Kong, China. Piscataway: IEEE; 2020. p. 1434–40.
[19]	Ren J, Ye C, Li Y.A new solution to distributed permutation flow shop scheduling problem based on NASH Q-Learning.Adv Prod Eng Manag 2021; 16(3):269-284.
[20]	Yang S, Wang J, Xu Z.Real-time scheduling for distributed permutation flowshops with dynamic job arrivals using deep reinforcement learning.Adv Eng Inform 2022; 54:101776.
[21]	Chen JF, Wang L, Peng ZP.A collaborative optimization algorithm for energy-efficient multi-objective distributed no-idle flow-shop scheduling.Swarm Evol Comput 2019; 50:100557.
[22]	Zhang X, Liu X, Cichon A, Królczyk G, Li Z.Scheduling of energy-efficient distributed blocking flowshop using pareto-based estimation of distribution algorithm.Expert Syst Appl 2022; 200:116910.
[23]	Mou J, Duan P, Gao L, Liu X, Li J.An effective hybrid collaborative algorithm for energy-efficient distributed permutation flow-shop inverse scheduling.Future Gener Comput Syst 2022; 128:521-537.
[24]	Zhao F, Di S, Wang L.A hyperheuristic with Q-learning for the multiobjective energy-efficient distributed blocking flow shop scheduling problem.IEEE Trans Cybern 2022; 53(5):3337-3350.
[25]	Shao Z, Shao W, Chen J, Pi D.MQL-MM: a meta-Q-learning-based multi-objective metaheuristic for energy-efficient distributed fuzzy hybrid blocking flow-shop scheduling problem.IEEE Trans Evol Comput 2024:1–1.
[26]	Zhao F, Zhou G, Xu T, Zhu N.A knowledge-driven cooperative scatter search algorithm with reinforcement learning for the distributed blocking flow shop scheduling problem.Expert Syst Appl 2023; 230:120571.
[27]	Bao H, Pan Q, Ruiz R, Gao L.A collaborative iterated greedy algorithm with reinforcement learning for energy-aware distributed blocking flow-shop scheduling.Swarm Evolut Comput 2023; 83:101399.
[28]	Liu C, Xu X, Hu D.Multiobjective reinforcement learning: a comprehensive overview.IEEE Trans Syst Man Cybern 2014; 45(3):385-398.
[29]	Gábor Z, Kalmár Z, Szepesvári C.Multi-criteria reinforcement learning.In: Proceedings of the Fifteenth International Conference on Machine Learning; 1998 Jul 24–27; Madison, WI, USA. San Francisco: Morgan Kaufmann Publishers; 1998. p. 197–205.
[30]	Feinberg EA, Shwartz A.Constrained Markov decision models with weighted discounted rewards.Math Oper Res 1995; 20(2):302-320.
[31]	Russell SJ, Zimdars A.Q-decomposition for reinforcement learning agents.In: Proceedings of the Twentieth International Conference on International Conference on Machine Learning; 2003 Aug 21–24; Washington, DC, USA. Palo Alto: AAAI Press; 2003. p. 656–63.
[32]	Barrett L, Narayanan S.Learning all optimal policies with multiple criteria.In: Proceedings of the 25th international conference on Machine learning; 2008 Jul 5–9; Helsinki, Finland. New York: ACM; 2008. p. 41–7.
[33]	Van Moffaert K, Now Aé.Multi-objective reinforcement learning using sets of pareto dominating policies.J Mach Learn Res 2014; 15(1):3483-3512.
[34]	Mossalam H, Assael YM, Roijers DM, Whiteson S.Multi-objective deep reinforcement learning.2016. arXiv: 1610.02707.
[35]	Abels A, Roijers D, Lenaerts T, Steckelmacher D.Dynamic weights in multi-objective deep reinforcement learning.2018. arXiv: 1809.07803.
[36]	Nguyen TT, Nguyen ND, Vamplew P, Nahavandi S, Dazeley R, Lim CP.A multi-objective deep reinforcement learning framework.Eng Appl Artif Intell 2020; 96:103915.
[37]	Siddique U, Weng P, Zimmer M.Learning fair policies in multi-objective (deep) reinforcement learning with average and discounted rewards.In: Proceedings of the 37th International Conference on Machine Learning; 2020 Jul 13–18; Vienna, Austria. Brookline: JMLR; 2020. p. 8905–15.
[38]	He Z, Tran KP, Thomassey S, Zeng X, Xu J, Yi C.Multi-objective optimization of the textile manufacturing process using deep-Q-network based multi-agent reinforcement learning.J Manuf Syst 2022; 62:939-949.
[39]	Yang R, Sun X, Narasimhan K.A generalized algorithm for multi-objective reinforcement learning and policy adaptation.In: Proceedings of the 33rd International Conference on Neural Information Processing Systems; 2019 Dec 8–14; Vancouver, BC, Canada. New York: Curran Associates; 2019. p. 14636–47.
[40]	Luo S, Zhang L, Fan Y.Dynamic multi-objective scheduling for flexible job shop by deep reinforcement learning.Comput Ind Eng 2021; 159:107489.
[41]	Lowe R, Wu Y, Tamar A, Harb J, Pieter P, Mordatch I.Multi-agent actor-critic for mixed cooperative–competitive environments.In: Proceedings of the 31st International Conference on Neural Information Processing Systems; 2017 Dec 4–9; Long Beach, CA, USA. New York: Curran Associates; 2017. p. 6382–93.
[42]	Yu C, Velu A, Vinitsky E, Gao J, Wang Y, Bayen A, et al.The surprising effectiveness of PPO in cooperative multi-agent games.In: Proceedings of the 36th International Conference on Neural Information Processing Systems; 2022 Nov 28–Dec 9; New Orleans, LA, USA. New York: Curran Associates; 2024. p. 24611–24.
[43]	Engstrom L, Ilyas A, Santurkar S, Tsipras D, Janoos F, Rudolph L, et al.Implementation matters in deep RL: a case study on PPO and TROP.In: Proceedings of 8th International Conference on Learning Representations, 2020 April 26–30; Addis Ababa, Ethiopia. Appleton: ICLR; 2020. p. 12883–98
[44]	Sun X, Shen W, Vogel-Heuser B.A hybrid genetic algorithm for distributed hybrid blocking flowshop scheduling problem.J Manuf Syst 2023; 71:390-405.
[45]	Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O.Proximal policy optimization algorithms.2017. arXiv: 1707.06347.
[46]	Zhao L, Fan J, Zhang C, Shen W, Zhuang J.A DRL-based reactive scheduling policy for flexible job shops with random job arrivals.IEEE Trans Autom Sci Eng 2024; 21(3):2912-2923.
[47]	Zhao F, Zhang H, Wang L, Xu T, Zhu N, Jonrinaldi J.A multi-objective discrete differential evolution algorithm for energy-efficient distributed blocking flow shop scheduling problem.Int J Prod Res 2023; 62(12):4226-4244.
[48]	Zhao F, Zhang H, Wang L.A pareto-based discrete jaya algorithm for multiobjective carbon-efficient distributed blocking flow shop scheduling problem.IEEE Trans Industr Inform 2023; 19(8):8588-8599.
[49]	Alegre LN, Bazzan ALC, Roijers DM, da Silva BC.Sample-efficient multi-objective learning via generalized policy improvement prioritization.2023. arXiv: 2301.07784.