×

TD3-BC-PPO: twin delayed DDPG-based and behavior cloning-enhanced proximal policy optimization for dynamic optimization affine formation. (English) Zbl 1543.93194

Summary: This article addresses the dynamic optimal design problem of the affine formation for adversarial multi-agent systems in real-time confrontation environments. To maximize the win rate and maximize the battle damage ratio, a novel deep reinforcement learning (DRL) algorithm named TD3-BC-PPO is designed to adaptively control the shape transformation of affine formation, including translation, rotation, and scaling. Specifically, twin delayed deep deterministic policy gradient (TD3) is introduced to enhance the exploratory aspect of the strategy and efficiently leverage the crucial sparse win-loss rewards. Behavior cloning (BC) is used for strategy replication and model migration. Proximal policy optimization (PPO) is utilized for stabilizing policy updates, conducting a secondary optimization of the strategy through dense damage-related rewards. To validate the effectiveness and robustness of the proposed algorithm, extensive simulation experiments are conducted. Comparative analyses are performed against several baselines, encompassing a diverse range of formation confrontation situations. These situations involve various initial formation structures, differing attack angles, unequal combat quantity and expanded confrontation space dimensions. Simulation results show that the proposed TD3-BC-PPO algorithm can engender an impressive surge of at least 5.4% in the win rate and an incremental ascent of at least 0.274 in the battle damage ratio of the affine formation for adversarial multi-agent systems in complex real-time confrontation scenarios.

MSC:

93C40 Adaptive control/observation systems
68T07 Artificial neural networks and deep learning
93A16 Multi-agent systems
Full Text: DOI

References:

[1] Qin, X.-Y., Multi-agent formation control method under communication interruption, (2021 IEEE 3rd International Conference on Civil Aviation Safety and Information Technology. 2021 IEEE 3rd International Conference on Civil Aviation Safety and Information Technology, ICCASIT, 2021), 801-805
[2] Zhu, C.; Huang, B.; Xu, Y.; Lu, Y.; Su, Y.-M., Completely distributed affine formation maneuvering of networked marine surface vehicles with cooperation localization, IEEE Trans. Veh. Technol., 71, 12, 12523-12529, 2022
[3] L.-S. Bai, Z.-J. Zhao, X.-Z. Meng, Y.-M. Wang, Q.-L. Rao, X.-Z. Deng, Research on UAV Formation Simulation and Evaluation Technology, in: 2022 5th International Conference on Intelligent Autonomous Systems, (ICoIAS), 2022, pp. 166-171.
[4] Pang, W.; Zhu, D.-Q.; Chu, Z.-Z.; Chen, Q., Distributed adaptive formation reconfiguration control for multiple AUVs based on affine transformation in three-dimensional ocean environments, IEEE Trans. Veh. Technol., 72, 6, 7338-7350, 2023
[5] Z.-P. Huang, R. Bauer, Y.-J. Pan, Affine Formation Control of Multiple Quadcopters, in: IECON 2022 - 48th Annual Conference of the IEEE Industrial Electronics Society, 2022, pp. 1-5.
[6] D. Li, Y.-F. Fu, S.-C. Qu, Y. Lu, J.-N. Wei, Research on Formation Control for Unmanned Surface System Based on Behavioral Approach, in: 2023 IEEE 2nd Industrial Electronics Society Annual on-Line Conference, ONCON, 2023, pp. 1-4.
[7] R. Babazadeh, R.-R. Selmic, Distance-Based Formation Control of Nonlinear Agents Over Planar Directed Graphs, in: 2022 American Control Conference, ACC, 2022, pp. 2321-2326.
[8] Z.-T. Chen, Q. Wang, E.-C. Wang, M.-S. Du, Fuzzy adaptive formation control for a class of nonlinear systems with bearing-only measurements, in: 2022 41st Chinese Control Conference, CCC, 2022, pp. 4532-4537.
[9] Z. Zhang, Y.-F. Luo, Y.-H. Qu, Distributed Generation and Control of Formation for Multiple Hypersonic Gliding Vehicles Based on Finite-Time Consensus Theory, in: 2023 42nd Chinese Control Conference, CCC, 2023, pp. 5963-5968.
[10] Mao, D.; Zhang, Z.-Q.; Lu, Z.-H., Integral event-triggered formation control for general linear multi-agent systems, (2023 China Automation Congress. 2023 China Automation Congress, CAC, 2023), 4517-4521
[11] Chen, Y.-Y.; Yu, R.; Zhang, Y.; Liu, C.-L., Circular formation flight control for unmanned aerial vehicles with directed network and external disturbance, IEEE/CAA J. Autom. Sin., 7, 2, 505-516, 2020
[12] Chen, Y.-Y.; Chen, K.-W.; Astolfi, A., Adaptive formation tracking control of directed networked vehicles in a time-varying flowfield, J. Guid. Control Dyn., 44, 10, 1883-1891, 2021
[13] Chen, Y.-Y.; Chen, K.-W.; Astolfi, A., Adaptive formation tracking control for first-order agents with a time-varying flow parameter, IEEE Trans. Autom. Control, 67, 5, 2481-2488, 2022 · Zbl 1537.93379
[14] Fang, X.; Li, X.-L.; Xie, L.-H., Distributed formation maneuver control of multiagent systems over directed graphs, IEEE Trans. Cybern., 52, 8, 8201-8212, 2022
[15] Lin, Z.-Y.; Wang, L.-L.; Chen, Z.-Y.; Fu, M.-Y.; Han, Z.-M., Necessary and sufficient graphical conditions for affine formation control, IEEE Trans. Autom. Control, 61, 10, 2877-2891, 2016 · Zbl 1359.93320
[16] Zhao, S.-Y., Affine formation maneuver control of multiagent systems, IEEE Trans. Autom. Control, 63, 12, 4140-4155, 2018 · Zbl 1423.93042
[17] Xu, Y.; Zhao, S.-Y.; Luo, D.-L.; You, Y.-C., Affine formation maneuver control of high-order multi-agent systems over directed networks, Automatica, 118, Article 109004 pp., 2020 · Zbl 1447.93010
[18] Chang, Z.-Z.; Song, W.-H.; Wang, J.-J.; Li, Z.-K., Fully distributed event-triggered affine formation maneuver control over directed graphs, Sci. China Inf. Sci., 66, 2023
[19] Ma, S.-H.; Zhang, D.; Zhao, Y.; Xian, C.-X.; Zheng, Y.-S., Event-triggered affine formation maneuver control for second-order multi-agent systems with sampled data, J. Franklin Inst., 360, 18, 14645-14659, 2023 · Zbl 1530.93277
[20] M. Maaruf, S. El-Ferik, F.-M. Al-Sunni, Neural Network-Based Control for Affine Formation Maneuver of Multi-Agent Systems with External Disturbances, in: 2023 31st Mediterranean Conference on Control and Automation, MED, 2023, pp. 862-867.
[21] R. Adderson, Y.-J. Pan, Formation Shaping Control for Multi-Agent Systems with Obstacle Avoidance and Dynamic Leader Selection, in: 2022 IEEE 31st International Symposium on Industrial Electronics, ISIE, 2022, pp. 1082-1087.
[22] X.-Z. Zhang, J.-S. Lv, S.-L. Lu, Q.-K. Yang, Distributed Decision Making on Scaling Size for Obstacle Avoidance in Affine Formation Control, in: 2022 37th Youth Academic Annual Conference of Chinese Association of Automation, YAC, 2022, pp. 1001-1006.
[23] L.-W. An, W.-N. Gao, C. Deng, W.-W. Che, Collision/Obstacle Avoidance Dynamic Formation Reconfiguration of High-Order Nonlinear Multi-Agent Systems, in: 2023 35th Chinese Control and Decision Conference, CCDC, 2023, pp. 1480-1486.
[24] Liu, Q.-S.; Wang, M., A projection-based algorithm for optimal formation and optimal matching of multi-robot system, Nonlinear Dynam., 104, 2021
[25] Yue, Y.-Y.; Liu, Q.-S.; Zhang, Z.-M., A distributed projection-based algorithm with local estimators for optimal formation of multi-robot system, 344-355, 2023
[26] Sun, X.-M.; Cassandras, C.-G., Optimal dynamic formation control of multi-agent systems in constrained environments, Automatica, 73, 169-179, 2016 · Zbl 1372.93028
[27] Dong, H.-Y.; Zhao, X.-W., Composite experience replay-based deep reinforcement learning with application in wind farm control, IEEE Trans. Control Syst. Technol., 30, 3, 1281-1295, 2022
[28] Yin, M.-M.; Zhao, Y.-Y.; Li, F.-B.; Liu, B.; Yang, C.-H.; Gui, W.-H., Collision avoidance control for limited perception unmanned surface vehicle swarm based on proximal policy optimization, J. Franklin Inst., 361, 6, Article 106709 pp., 2024 · Zbl 1536.93618
[29] Ma, H.-X.; Yang, G.; Sun, X.-X.; Qu, D.-M.; Chen, G.-Y.; Jin, X.-Y.; Zhou, N.; Liu, X.-X., Improved DRL-based energy-efficient UAV control for maximum lifecycle, J. Franklin Inst., 361, 6, Article 106718 pp., 2024 · Zbl 1536.93611
[30] Sadhukhan, P.; Selmic, R., Proximal policy optimization for formation navigation and obstacle avoidance, Int. J. Intell. Robot. Appl., 6, 1-14, 2022
[31] Zhou, C.-H.; Li, J.-X.; Shi, Y.-J.; Lin, Z.-R., Research on multi-robot formation control based on MATD3 algorithm, Appl. Sci., 13, 3, 2023
[32] Pang, W.; Zhu, D.-D.; Chu, Z.-Z.; Chen, Q., Distributed adaptive formation reconfiguration control for multiple AUVs based on affine transformation in three-dimensional ocean environments, IEEE Trans. Veh. Technol., 72, 6, 7338-7350, 2023
[33] Oikarinen, T.; Zhang, W.; Megretski, A.; Daniel, L.; Weng, T.-W., Robust deep reinforcement learning through adversarial loss, (Advances in Neural Information Processing Systems, vol. 34, 2021), 26156-26167
[34] Jayant, A. K.; Bhatnagar, S., Model-based safe deep reinforcement learning via a constrained proximal policy optimization algorithm, (Advances in Neural Information Processing Systems, vol. 35, 2022), 24432-24445
[35] Aitchison, M.; Sweetser, P., DNA: Proximal policy optimization with a dual network architecture, (Advances in Neural Information Processing Systems, vol. 35, 2022), 35921-35932
[36] D. Baumann, J.-J. Zhu, G. Martius, S. Trimpe, Deep Reinforcement Learning for Event-Triggered Control, in: 2018 IEEE Conference on Decision and Control, CDC, 2018, pp. 943-950.
[37] S. Gillen, M. Molnar, K. Byl, Combining Deep Reinforcement Learning And Local Control For The Acrobot Swing-up And Balance Task, in: 2020 59th IEEE Conference on Decision and Control, CDC, 2020, pp. 4129-4134.
[38] N. Suriyarachchi, E. Noorani, F.-M. Tariq, J.-S. Baras, Multi-agent Deep Reinforcement Learning for Shock Wave Detection and Dissipation using Vehicle-to-Vehicle Communication, in: 2022 IEEE 61st Conference on Decision and Control, CDC, 2022, pp. 4072-4077.
[39] A. Ajagekar, F. You, Deep Reinforcement Learning based Solution Approach for Unit Commitment under Demand and Wind Power Uncertainty, in: 2022 American Control Conference, ACC, 2022, pp. 4520-4525.
[40] Samvelyan, M.; Rashid, T.; Schroeder de Witt, C.; Farquhar, G.; Nardelli, N.; Rudner, T.-G.-J.; Hung, C.-M.; Torr, P.-H.-S.; Foerster, J.; Whiteson, S., The StarCraft multi-agent challenge, 2019
[41] Haarnoja, T.; Zhou, A.; Abbeel, P.; Levine, S., Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor, (Proceedings of the 35th International Conference on Machine Learning, vol. 80, 2018), 1861-1870
[42] Lyu, J.-F.; Ma, X.-T.; Yan, J.-P.; Li, X., Efficient continuous control with double actors and regularized critics, (Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, 2022), 7655-7663
[43] Yang, Z.-Y.; Merrick, K.; Jin, L.-W.; Abbass, H.-A., Hierarchical deep reinforcement learning for continuous action control, IEEE Trans. Neural Netw. Learn. Syst., 29, 11, 5174-5184, 2018
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.