×

Heterogeneous optimal formation control of nonlinear multi-agent systems with unknown dynamics by safe reinforcement learning. (English) Zbl 07748311

Summary: This article presents the problem of distributed training with a decentralized execution policy as a safe, optimal formation control for a heterogeneous nonlinear multi-agent system. The control objective is to guarantee safety while achieving optimal performance. This objective is achieved by introducing novel distributed optimization problems with cost and local control barrier functions (CBFs). Designing an optimal formation controller is defined as optimal performance and modeled by a cost function. A local CBF trains a safe controller to ensure multi-agent systems operate within the safe regions. Instead of optimizing constrained optimization problems, this method generates safe, optimal controllers from unconstrained optimization problems by utilizing local CBFs. As a result, the presented approach has a lower computational cost than constrained optimization problems. It is proven that the proposed controller’s optimality and stability are not affected by adding the local CBF to the cost function. A safe, optimal policy is iteratively derived using a new off-policy multi-agent reinforcement learning (MARL) algorithm that does not need knowledge of the agents’ dynamics. Finally, the effectiveness of the proposed algorithm is evaluated through simulation of the collision-free problem of the multi-quadrotor formation control.

MSC:

93C15 Control/observation systems governed by ordinary differential equations
49K15 Optimality conditions for problems involving ordinary differential equations
49L20 Dynamic programming in optimal control and differential games
93A16 Multi-agent systems
93D30 Lyapunov and storage functions
93C41 Control/observation systems with incomplete information
Full Text: DOI

References:

[1] Rizk, Y.; Awad, M.; Tunstel, E. W., Cooperative Heterogeneous Multi-Robot Systems: a Survey, ACM Comput. Surv., 52, 2, 1-31 (2020)
[2] Amirian, N.; Shamaghdari, S., Distributed resilient flocking control of multi-agent systems through event/self-triggered communication, IET Control Theory Applic., 15, 4, 559-569 (2021)
[3] Hua, Y.; Dong, X.; Li, Q.; Ren, Z., Distributed adaptive formation tracking for heterogeneous multiagent systems with multiple nonidentical leaders and without well-informed follower, Int. J. Robust Nonlinear Control, 3, 6, 2131-2151 (2020) · Zbl 1465.93104
[4] Wang, L.; Xi, J.; He, M.; Liu, G., Robust time-varying formation design for multiagent systems with disturbances: extended-state-observer method, Int. J. Robust Nonlinear Control, 30, 7, 2796-2808 (2020) · Zbl 1465.93049
[5] Liu, H.; Peng, F.; Modares, H.; Kiumarsi, B., Heterogeneous formation control of multiple rotorcrafts with unknown dynamics by reinforcement learning, Inf. Sci. (Ny), 558, 194-207 (2021) · Zbl 1489.93007
[6] Lin, W.; Zhao, W.; Liu, H., Robust optimal formation control of heterogeneous multi-agent system via reinforcement learning, IEEE Access, 8, 218424-218432 (2020)
[7] Sutton, R.; Barto, A., Reinforcement Learning: An Introduction (2018), MIT Press: MIT Press Cambridge: MA · Zbl 1407.68009
[8] Lewis, F.; Vrabie, D., Reinforcement learning and adaptive dynamic programming for feedback control, IEEE Circu. Syst. Mag., 9, 3, 32-50 (2009)
[9] Long, P.; Fan, T.; Liao, X.; Liu, W.; Zhang, H.; Pan, J., Towards optimally decentralized multi-robot collision avoidance via deep reinforcement learning, (IEEE International Conference on Robotics and Automation (ICRA). IEEE International Conference on Robotics and Automation (ICRA), Brisbane, QLD, Australia (2018))
[10] Panagiaris, N.; Hart, E.; Gkatzia, D., Generating unambiguous and diverse referring expressions, Comput. Speech Lang., 68, 101184 (2021)
[11] Matta, M.; Cardarilli, G.; Di Nunzio, L.; Fazzolari, R.; Giardino, D.; Nannarelli, A.; Re, M.; Spanò, S., A reinforcement learning-based QAM/PSK symbol synchronizer, IEEE Access, 7, 124147-124157 (2019)
[12] Pakkhesal, S.; Shamaghdari, S., SOS-based policy iteration for H∞ control of polynomial systems with uncertain, Int. J. Control (2022), (in press)
[13] Mu, C.; Zhao, Q.; Gao, Z.; Sun, C., Q-learning solution for optimal consensus control of discrete-time multiagent systems using reinforcement learning, J. Franklin Inst., 356, 13, 6946-6967 (2019) · Zbl 1418.93250
[14] Peng, Z.; Zhao, Y.; Hu, J.; Ghosh, B., Data-driven optimal tracking control of discrete-time multi-agent systems with two-stage policy iteration algorithm, Inf. Sci. (Ny), 481, 189-202 (2019) · Zbl 1451.93228
[15] Peng, Z.; Hu, J.; Shi, K.; Luo, R.; Huang, R.; Ghosh, B.; Huang, J., A novel optimal bipartite consensus control scheme for unknown multi-agent systems via model-free reinforcement learning, Appl. Math. Comput., 369, 124821 (2020) · Zbl 1433.93008
[16] Li, J.; Ji, L.; Li, H., Optimal consensus control for unknown second-order multi-agent systems: using model-free reinforcement learning method, Appl. Math. Comput., 410, 126451 (2021) · Zbl 1510.93020
[17] Zhang, J.; Wang, Z.; Zhang, H., Data-based optimal control of multiagent systems: a reinforcement learning design approach, IEEE Trans. Cybern., 49, 12, 4441-4449 (2019)
[18] Bai, H.; George, J.; Chakrabortty, A., Hierarchical control of multi-agent systems using online reinforcement learning, (American Control Conference (ACC). American Control Conference (ACC), Denver, CO, USA (2020))
[19] Chen, C.; Lewis, F.; Xie, K.; Xie, S.; Liu, Y., Brief paper: off-policy learning for adaptive optimal output synchronization of heterogeneous multi-agent systems, Automatica, 119, 109081 (2020) · Zbl 1451.93012
[20] Li, Q.; Xia, L.; Song, R.; Liu, J., Leader-follower bipartite output synchronization on signed digraphs under adversarial factors via data-based reinforcement learning, IEEE Trans. Neur. Netw. Learn. Syst., 31, 10, 4185-4195 (2020)
[21] Ji, L.; Wang, C.; Zhang, C.; Wang, H.; Li, H., Optimal consensus model-free control for multi-agent systems subject to input delays and switching topologies, Inf. Sci. (Ny), 589, 497-515 (2022) · Zbl 1536.93798
[22] Wen, G.; Chen, C. L.P.; Li, B., Optimized formation control using simplified reinforcement learning for a class of multiagent systems with unknown dynamics, IEEE Trans. Ind. Electron., 67, 9, 7879-7888 (2020)
[23] Li, H.; Wu, Y.; Chen, M., Adaptive fault-tolerant tracking control for discrete-time multiagent systems via reinforcement learning algorithm, IEEE Trans. Cybern., 51, 3, 1163-1174 (2021)
[24] Qu, Q.; Sun, L.; Li, Z., Adaptive critic design-based robust cooperative tracking control for nonlinear multi-agent systems with disturbances, IEEE Access, 9, 34383-34394 (2021)
[25] Tatari, F.; Vamvoudakis, K. G.; Mazouchi, M., Optimal distributed learning for disturbance rejection in networked nonlinear games under unknown dynamics, IET Control Theory Applic., 13, 17, 2838-2848 (2019)
[26] W, Zhao; Liu, H.; Lewis, F. L.; Wang, X., Data-driven optimal formation control for quadrotor team with unknown dynamics, IEEE Trans. Cybern., 52, 8, 7889-7898 (2022)
[27] Bastani, O., Safe Reinforcement learning with nonlinear dynamics via model predictive shielding, (American Control Conference (ACC). American Control Conference (ACC), New Orleans, LA, USA (2021))
[28] Bastani, O.; Li, S.; Xu, A., Safe reinforcement learning via statistical model predictive shielding, Robot.: Sci. Syst. (2021), Held Virtually
[29] Zanon, M.; Gros, S., Safe reinforcement learning using robust MPC, IEEE Trans. Automat. Contr., 66, 8, 3638-3652 (2021) · Zbl 1471.93093
[30] Yang, Y.; Vamvoudakis, K. G.; Modares, H.; He, W.; Yin, Y.; Wunsch, D. C., Safe intermittent reinforcement learning for nonlinear systems, (IEEE 58th Conference on Decision and Control (CDC). IEEE 58th Conference on Decision and Control (CDC), Nice, France (2019))
[31] Yang, Y.; Ding, D.; Xiong, H.; Yin, Y.; Wunsch, D. C., Online barrier-actor-critic learning for H_∞ control with full-state constraints and input saturation, J. Franklin Inst., 357, 6, 3316-3344 (2020) · Zbl 1437.93027
[32] Yang, Y.; Vamvoudakis, K. G.; Modares, H., Safe reinforcement learning for dynamical games, Int. J. Robust Nonlinear Control, 30, 9, 3706-3726 (2020) · Zbl 1466.91038
[33] Yazdani, N. M.; Moghaddam, R. K.; Kiumarsi, B.; Modares, H., A safety-certified policy iteration algorithm for control of constrained nonlinear systems, IEEE Control Syst. Lett., 4, 3, 686-691 (2020)
[34] Marvi, Z.; Kiumarsi, B., Safe reinforcement learning: a control barrier function optimization approach, Int. J. Robust Nonlinear Control, 31, 6, 1923-1940 (2021) · Zbl 1526.93181
[35] Qin, J.; Li, M.; Shi, Y.; Ma, Q.; Zheng, W. X., Optimal synchronization control of multiagent systems with input saturation via off-policy reinforcement learning, IEEE Trans. Neural Netw. Learn. Syst., 30, 1, 85-96 (2019)
[36] Yan, B.; Shi, P.; Lim, C.; Shi, Z., Optimal robust formation control for heterogeneous multi-agent systems based on reinforcement learning, Int. J. Robust Nonlinear Control, 32, 5, 2683-2704 (2022) · Zbl 1527.93075
[37] Olfati-Saber, R.; Murray, R. M., Consensus problems in networks of agents with switching topology and time-delays, IEEE Trans. Automat. Contr., 49, 9, 1520-1533 (2004) · Zbl 1365.93301
[38] Wang, J. L.; Wu, H. N., Leader-following formation control of multi-agent systems under fixed and switching topologies, Int. J. Control, 85, 6, 695-705 (2012) · Zbl 1256.93013
[39] Khalil, H. K., Nonlinear Systems (2002), Pearson Education Prentice Hall: Pearson Education Prentice Hall Upper Saddle River, New Jersey · Zbl 1003.34002
[40] Jiang, Y.; Jiang, Z., Robust adaptive dynamic programming and feedback stabilization of nonlinear systems, IEEE Trans. Neural Netw. Learn. Syst., 25, 5, 882-893 (2014)
[41] Modares, H.; Lewis, F.; Jiang, Z., H∞ tracking control of completely unknown continuous-time systems via off-policy reinforcement learning, IEEE Trans. Neural Netw. Learn. Syst., 26, 10, 2550-2562 (2015)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.