×

Solving non-permutation flow-shop scheduling problem via a novel deep reinforcement learning approach. (English) Zbl 1542.90125

Summary: The non-permutation flow-shop scheduling problem (NPFS) is studied. We model it as a Markov decision process, creating a massive arena for reinforcement learning (RL) algorithms to work. While RL approaches with function approximation generate a significant number of sequences of highly linked states, few studies have examined the connection between the state sequences but merely shuffled their orders. To this end, this paper proposes a novel deep reinforcement learning algorithm, named LSTM-TD(0), to address NPFS. Specifically, we design fifteen state features to represent a production state at each scheduling point and fourteen actions to choose an unprocessed operation on a given machine. This study applies long short-term memory (LSTM) network to capture the intrinsic connection of the state sequences in RL-based scheduling approaches. Moreover, we enhance the LSTM model with the one-step temporal difference (TD(0)) algorithm to select each action impartially in relation to the state value, avoiding the frequent overestimation of action values in Q-learning. The proposed LSTM-TD(0) was trained using two LSTM networks and enhanced by redesigning the reward value. A series of comparative experiments were conducted between simple heuristic rules, metaheuristic rules, general DRL methods, and LSTM-TD(0) using a group of well-known benchmark problems with different scales. Comparative results have confirmed both the superiority and universality of LSTM-TD(0) over its competitors. Scalability tests reveal that our approach can generalize to instances of different sizes without retraining or knowledge transferring.

MSC:

90B35 Deterministic scheduling theory in operations research
68T05 Learning and adaptive systems in artificial intelligence
90C59 Approximation methods and heuristics in mathematical programming
Full Text: DOI

References:

[1] Averbakh, I.; Berman, O., A simple heuristic for m-machine flow-shop and its applications in routing-scheduling problems, Oper. Res., 47, 1, 165-170 (1999) · Zbl 1046.90027
[2] Baker, K. R., Introduction to Sequencing and Scheduling (1974), John Wiley & Sons
[3] Benavides, A. J.; Ritt, M., Two simple and effective heuristics for minimizing the makespan in non-permutation flow shops, Comput. Oper. Res., 66, 160-169 (2016) · Zbl 1349.90318
[4] Boukef, H.; Benrejeb, M.; Borne, P., A proposed genetic algorithm coding for flow-shop scheduling problems, Int. J. Comput. Commun. Control, 2, 3, 229-240 (2007)
[5] Campbell, H. G.; Dudek, R. A.; Smith, M. L., A heuristic algorithm for the n job, m machine sequencing problem, Manage. Sci., 16, 10, B-630 (1970) · Zbl 0194.50504
[6] Chandra, P.; Mehta, P.; Tirupati, D., Permutation flow shop scheduling with earliness and tardiness penalties, Int. J. Prod. Res., 47, 20, 5591-5610 (2009) · Zbl 1198.90172
[7] Chen, R.; Yang, B.; Li, S.; Wang, S., A self-learning genetic algorithm based on reinforcement learning for flexible job-shop scheduling problem, Comput. Ind. Eng., 149, Article 106778 pp. (2020)
[8] Demirkol, E.; Mehta, S.; Uzsoy, R., Benchmarks for shop scheduling problems, European J. Oper. Res., 109, 1, 137-141 (1998) · Zbl 0951.90022
[9] Fernandez-Viagas, V.; Framinan, J. M., A best-of-breed iterated greedy for the permutation flowshop scheduling problem with makespan objective, Comput. Oper. Res., 112, Article 104767 pp. (2019) · Zbl 1458.90290
[10] Fernandez-Viagas, V.; Ruiz, R.; Framinan, J. M., A new vision of approximate methods for the permutation flowshop to minimise makespan: State-of-the-art and computational evaluation, European J. Oper. Res., 257, 3, 707-721 (2017) · Zbl 1394.90271
[11] Graham, R. L.; Lawler, E. L.; Lenstra, J. K.; Kan, A. R., Optimization and approximation in deterministic sequencing and scheduling: a survey, (Annals of Discrete Mathematics, vol. 5 (1979), Elsevier), 287-326 · Zbl 0411.90044
[12] Han, W.; Guo, F.; Su, X., A reinforcement learning method for a hybrid flow-shop scheduling problem, Algorithms, 12, 11, 222 (2019)
[13] Hochreiter, S.; Schmidhuber, J., Long short-term memory, Neural Comput., 9, 8, 1735-1780 (1997)
[14] Konda, V. R.; Tsitsiklis, J. N., Actor-critic algorithms, (Advances in Neural Information Processing Systems (2000)), 1008-1014
[15] Lillicrap, T. P.; Hunt, J. J.; Pritzel, A.; Heess, N.; Erez, T.; Tassa, Y.; Silver, D.; Wierstra, D., Continuous control with deep reinforcement learning (2015), arXiv preprint arXiv:1509.02971
[16] Luo, S., Dynamic scheduling for flexible job shop with new job insertions by deep reinforcement learning, Appl. Soft Comput., 91, Article 106208 pp. (2020)
[17] Marinakis, Y.; Marinaki, M., Particle swarm optimization with expanding neighborhood topology for the permutation flowshop scheduling problem, Soft Comput., 17, 7, 1159-1173 (2013)
[18] McMahon, G.; Burton, P., Flow-shop scheduling with the branch-and-bound method, Oper. Res., 15, 3, 473-481 (1967)
[19] Mnih, V.; Badia, A. P.; Mirza, M.; Graves, A.; Lillicrap, T.; Harley, T.; Silver, D.; Kavukcuoglu, K., Asynchronous methods for deep reinforcement learning, (International Conference on Machine Learning (2016), PMLR), 1928-1937
[20] Mnih, V.; Kavukcuoglu, K.; Silver, D.; Graves, A.; Antonoglou, I.; Wierstra, D.; Riedmiller, M., Playing atari with deep reinforcement learning (2013), arXiv preprint arXiv:1312.5602
[21] Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A. A.; Veness, J.; Bellemare, M. G.; Graves, A.; Riedmiller, M.; Fidjeland, A. K.; Ostrovski, G., Human-level control through deep reinforcement learning, Nature, 518, 7540, 529-533 (2015)
[22] Nawaz, M.; Enscore Jr., E. E.; Ham, I., A heuristic algorithm for the m-machine, n-job flow-shop sequencing problem, Omega, 11, 1, 91-95 (1983)
[23] Pan, R.; Dong, X.; Han, S., Solving permutation flowshop problem with deep reinforcement learning, (2020 Prognostics and Health Management Conference (PHM-BesanÇon) (2020), IEEE), 349-353
[24] Ren, J.; Ye, C.; Yang, F., A novel solution to JSPS based on long short-term memory and policy gradient algorithm, Int. J. Simul. Model., 19, 1, 157-168 (2020)
[25] Ren, J.; Ye, C.; Yang, F., Solving flow-shop scheduling problem with a reinforcement learning algorithm that generalizes the value function with neural network, Alex. Eng. J., 60, 3, 2787-2800 (2021)
[26] Reyna, Y. C.F.; Cáceres, A. P.; Jiménez, Y. M.; Reyes, Y. T., An improvement of reinforcement learning approach for permutation of flow-shop scheduling problems, Rev. IbÉRica Sist. Tecnol. Inf., E18, 257-270 (2019)
[27] Röck, H.; Schmidt, G., Machine Aggregation Heuristics in Shop Scheduling (1982), Univ.-Bibl. d. Techn. Univ.
[28] Ronconi, D. P.; Birgin, E. G., Mixed-integer programming models for flowshop scheduling problems minimizing the total earliness and tardiness, (Just-in-Time Systems (2012), Springer), 91-105 · Zbl 1355.90028
[29] Rossi, A.; Lanzetta, M., Native metaheuristics for non-permutation flowshop scheduling, J. Intell. Manuf., 25, 6, 1221-1233 (2014)
[30] Rossit, D. A.; Tohmé, F.; Frutos, M., The non-permutation flow-shop scheduling problem: a literature review, Omega, 77, 143-153 (2018)
[31] Schaul, T.; Quan, J.; Antonoglou, I.; Silver, D., Prioritized experience replay (2015), arXiv preprint arXiv:1511.05952
[32] Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; Klimov, O., Proximal policy optimization algorithms (2017), arXiv preprint arXiv:1707.06347
[33] Stefan, P., Flow-shop scheduling based on reinforcement learning algorithm, Prod. Syst. Inf. Eng., 1, 1, 83-90 (2003)
[34] Sundermeyer, M., Schlüter, R., Ney, H., 2012. LSTM neural networks for language modeling. In: Thirteenth Annual Conference of the International Speech Communication Association.
[35] Sutton, R. S.; Barto, A. G., Reinforcement Learning: An Introduction (2018), MIT Press · Zbl 1407.68009
[36] Van Hasselt, H., Guez, A., Silver, D., 2016. Deep reinforcement learning with double q-learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30, no. 1.
[37] Viloria, A.; Sierra, D. M.; Duran, S. E.; Rambal, E. P.; Hernández-Palma, H.; Ventura, J. M.; Pichon, A. R.; Torres, L. J.J., Optimization of flow shop scheduling through a hybrid genetic algorithm for manufacturing companies, (International Conference on Intelligent Computing, Information and Control Systems (2019), Springer), 20-29
[38] Wang, Y.; Gao, S.; Wang, S.; Zimmermann, R., An adaptive multi-objective multi-task service composition approach considering practical constraints in fog manufacturing, IEEE Trans. Ind. Inform. (2021)
[39] Wang, Z.; Schaul, T.; Hessel, M.; Hasselt, H.; Lanctot, M.; Freitas, N., Dueling network architectures for deep reinforcement learning, (International Conference on Machine Learning (2016), PMLR), 1995-2003
[40] Wang, Y.; Wang, S.; Kang, L.; Wang, S., An effective dynamic service composition reconfiguration approach when service exceptions occur in real-life cloud manufacturing, Robot. Comput.-Integr. Manuf., 71, Article 102143 pp. (2021)
[41] Williams, R. J., Simple statistical gradient-following algorithms for connectionist reinforcement learning, Mach. Learn., 8, 3, 229-256 (1992) · Zbl 0772.68076
[42] Xiao, P.; Zhang, C.; Meng, L.; Hong, H.; Dai, W., Study on non-permutation flow shop scheduling problem based on deep reinforcement learning, Jisuanji Jicheng Zhizao Xitong/Comput. Integr. Manuf. Syst. CIMS (2019)
[43] Xu, J.; Zhou, X., A class of multi-objective expected value decision-making model with birandom coefficients and its application to flow shop scheduling problem, Inform. Sci., 179, 17, 2997-3017 (2009) · Zbl 1170.90428
[44] Yang, S.; Xu, Z., Intelligent scheduling for permutation flow shop with dynamic job arrival via deep reinforcement learning, (2021 IEEE 5th Advanced Information Technology, Electronic and Automation Control Conference, vol. 5. 2021 IEEE 5th Advanced Information Technology, Electronic and Automation Control Conference, vol. 5, IAEAC (2021), IEEE), 2672-2677
[45] Yang, S.; Xu, Z.; Wang, J., Intelligent decision-making of scheduling for dynamic permutation flowshop via deep reinforcement learning, Sensors, 21, 3, 1019 (2021)
[46] Yankai, W.; Shilong, W.; Dong, L.; Chunfeng, S.; Bo, Y., An improved multi-objective whale optimization algorithm for the hybrid flow shop scheduling problem considering device dynamic reconfiguration processes, Expert Syst. Appl., 174, Article 114793 pp. (2021)
[47] Ying, K.-C., Solving non-permutation flowshop scheduling problems by an effective iterated greedy heuristic, Int. J. Adv. Manuf. Technol., 38, 3-4, 348 (2008)
[48] Ying, K.-C.; Lin, S.-W., Multi-heuristic desirability ant colony system heuristic for non-permutation flowshop scheduling problems, Int. J. Adv. Manuf. Technol., 33, 7-8, 793-802 (2007)
[49] Zhang, Z.; Wang, W.; Zhong, S.; Hu, K., Flow shop scheduling with reinforcement learning, Asia-Pac. J. Oper. Res., 30, 05, Article 1350014 pp. (2013) · Zbl 1278.90179
[50] Zhang, D.-Y.; Ye, C.-M., Reinforcement learning algorithm for permutation flow shop scheduling to minimize makespan, Comput. Syst. Appl., 28, 12, 195-199 (2019)
[51] Zhang, Z.; Zheng, L., Manufacturing System Scheduling Based on Reinforcement Learning, 49-54 (2016), Beijing Science Press
[52] Zhu, J.; Wang, H.; Zhang, T., A deep reinforcement learning approach to the flexible flowshop scheduling problem with makespan minimization, (2020 IEEE 9th Data Driven Control and Learning Systems Conference. 2020 IEEE 9th Data Driven Control and Learning Systems Conference, DDCLS (2020), IEEE), 1220-1225
[53] Ziaee, M., A heuristic algorithm for solving flexible job shop scheduling problem, Int. J. Adv. Manuf. Technol., 71, 1-4, 519-528 (2014)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.