×

Optimal antisynchronization control for unknown multiagent systems with deep deterministic policy gradient approach. (English) Zbl 1536.93055

Summary: The topic of optimal antisynchronization control for unknown multiagent systems (MASs) with cooperative-competitive (i.e.,coopetition) interaction is challenging because of the complex connection characteristics and the coupling among agents. This paper proposes a reinforcement learning algorithm based on coopetition strength (CS) to achieve optimal antisynchronization in unknown MASs. Firstly, an innovative CS function is explored, and the local state error information of the agents can be redefined. Furthermore, we propose a novel strategy iteration method to approximate the agent’s optimal control policy. Meanwhile, the proposed algorithm’s convergence analysis is presented, which is based on the Lyapunov stability and functional theorem. In the implementation of a data-based control policy, a network structure of actor-critic (AC) is designed. To improve the robustness of the control policy, the target network and experience replay (ER) are introduced in the training process. Finally, the algorithm’s effectiveness is validated using comparable numerical simulations.

MSC:

93A16 Multi-agent systems
93C40 Adaptive control/observation systems
Full Text: DOI

References:

[1] Fan, D. D.; Theodorou, E. A.; Reeder, J., Model-based stochastic search for large scale optimization of multi-agent UAV swarms, IEEE Symp. Series Comput. Intell. (SSCI), 2216-2222 (2018)
[2] G. Wen, W. Hao, W. Feng and K. Gao, Optimized Backstepping Tracking Control Using Reinforcement Learning for Quadrotor Unmanned Aerial Vehicle System, IEEE Trans. Syst. Man Cybern., doi:10.1109/TSMC.2021.3112688.
[3] Dai, H.; Jia, J.; Yan, L.; Fang, X.; Chen, W., Distributed fixed-time optimization in economic dispatch over directed networks, IEEE Trans. Ind. Inf., 17, 5, 3011-3019 (2021)
[4] Li, C.; Yu, X.; Huang, T.; He, X., Distributed optimal consensus over resource allocation network and its application to dynamical economic dispatch, IEEE Trans. Neural Netw. Learn. Syst., 29, 6, 2407-2418 (2018)
[5] Chowdhury, D.; Khalil, H. K., Practical Synchronization in Networks of Nonlinear Heterogeneous Agents with Application to Power Systems, IEEE Trans. Autom. Control, 66, 1, 184-198 (2021) · Zbl 1536.93836
[6] Hu, J.; Hu, X.; Shen, T., Cooperative shift estimation of target trajectory using clustered sensors, J. Syst. Sci. Complexity, 27, 3, 413-429 (2014) · Zbl 1303.93166
[7] Tao, J.; Wu, Z.; Su, H.; Wu, Y.; Zhang, D., Asynchronous and Resilient Filtering for Markovian Jump Neural Networks Subject to Extended Dissipativity, IEEE Trans. Cybern., 49, 7, 2504-2513 (2019)
[8] K Shi, Wang J, S Zhong, X. Zhang, Y, Liu, J. Cheng, New reliable nonuniform sampling control for uncertain chaotic neural networks under Markov switching topologies. Appl. Math. Comput. 347 (2019) 169-193. · Zbl 1428.92015
[9] Zhang, H.; Jiang, H.; Luo, Y.; Xiao, G., Data-driven optimal consensus control for discrete-time multi-agent systems with unknown dynamics using reinforcement learning method, IEEE Trans. Ind. Electron., 64, 5, 4091-4100 (2017)
[10] Zhang, H.; Feng, T.; Yang, G.; Liang, H., Distributed cooperative optimal control for multiagent systems on directed graphs: An inverse optimal approach, IEEE Trans. Cybern., 45, 7, 1315-1326 (2014)
[11] Astarita, V.; Festa, D. C.; Giofr, V. P., Cooperative-Competitive Paradigm in Traffic Signal Synchronization Based on Floating Car Data, EEEIC I CPS Europe, 1-6 (2018)
[12] Li, K.; Ji, L.; Yang, S.; Li, H.; Liao, X., Couple-group consensus of cooperative-competitive heterogeneous multiagent systems: A fully distributed event-triggered and pinning control method, IEEE Trans. Cybern. (2020), early access
[13] Altafini, C., Consensus problems on networks with antagonistic interactions, IEEE Trans. Autom. Control, 58, 935-946 (2013) · Zbl 1369.93433
[14] Qin, J.; Fu, W.; Zheng, W. X.; Gao, H., On the bipartite consensus for generic linear multiagent systems with input saturation, IEEE Trans. Cybern., 47, 8, 807-818 (2017)
[15] Hu, J.; Wu, Y.; Li, T.; Ghosh, B. K., Consensus control of general linear multi-agent systems with antagonistic interactions and communication noises, IEEE Trans. Autom. Control, 64, 5, 2122-2127 (2019) · Zbl 1482.93040
[16] Peng, Z.; Hu, J.; Shi, K.; Luo, R.; Huang, R.; Ghosh, B. K.; Huang, J., A novel optimal bipartite consensus control scheme for unknown multi-agent systems via model-free reinforcement learning, Appl. Math. Comput., 369 (2020) · Zbl 1433.93008
[17] Peng, Z.; Zhao, Y.; Hu, J.; Luo, R.; Ghosh, B. K.; Nguang, S. K., Input-Output Data-Based Output Antisynchronization Control of Multiagent Systems Using Reinforcement Learning Approach, IEEE Trans. Ind. Informat., 17, 11, 7359-7367 (2021)
[18] Li, K.; Ji, L.; Zhang, C.; Li, H., Fully distributed event-triggered pinning group consensus control for heterogeneous multi-agent systems with cooperative-competitive interaction strength, Neurocomputing, 464, 273-281 (2021)
[19] Guo, G.; Kang, J.; Li, R.; Yang, G., Distributed model reference adaptive optimization of disturbed multiagent systems with intermittent communications, IEEE Trans. Cybern., 52, 6, 5464-5473 (2022)
[20] Guo, G.; Kang, J., Distributed Optimization of Multiagent Systems Against Unmatched Disturbances: A Hierarchical Integral Control Framework, IEEE Trans. Syst. Man Cybern., Syst., 52, 6, 3556-3567 (2022)
[21] Guo, G.; Zhang, R., Lyapunov Redesign-Based Optimal Consensus Control for Multi-Agent Systems with Uncertain Dynamics, IEEE Trans. Circuits Syst. Circuits-II., 69, 6, 2902-2906 (2022)
[22] Lewis, F. L.; Vamvoudakis, K. G., Reinforcement learning for partially observable dynamic processes: Adaptive dynamic programming using measured output data, IEEE Trans. Syst. Man Cybern. Part B Cyber., 41, 1, 14-25 (2011)
[23] Wen, G.; Chen, C.; Ge, S.; Yang, H.; Liu, X., Optimized adaptive nonlinear tracking control using actor-critic reinforcement learning strategy, IEEE Trans. Ind. Informat., 15, 9, 4969-4977 (2019)
[24] Peng, Z.; Zhao, Y.; Hu, J.; Ghosh, B. K., Data-driven optimal tracking control of discrete-time multi-agent systems with two-stage policy iteration algorithm, Inf. Sci., 481, 189-202 (2019) · Zbl 1451.93228
[25] Rui, H.; Lizhi, C.; Xuhui, B.; Junqi, Y., Distributed formation control for multiple non-holonomic wheeled mobile robots with velocity constraint by using improved data-driven iterative learning, Appl. Math. Comput., 395 (2021) · Zbl 1508.70010
[26] Lewis, F. L.; Vrabie, D.; Vamvoudakis, K. G., Reinforcement Learning and Feedback Control: Using Natural Decision Methods to Design Optimal Adaptive Controllers, IEEE Contr. Syst. Mag., 32, 6, 76-105 (2012) · Zbl 1395.93584
[27] Peters, J.; Bagnell, J. A., Policy Gradient Methods, 774-776 (2010), Springer: Springer Boston, MA, USA
[28] Luo, B.; Liu, D.; Wu, H.-N.; Wang, D.; Lewis, F. L., Policy gradient adaptive dynamic programming for data-based optimal control, IEEE Trans. Cybern., 47, 10, 3341-3354 (2017)
[29] Yang, X.; Zhang, H.; Wang, Z., Data-Based Optimal Consensus Control for Multiagent Systems With Policy Gradient Reinforcement Learning, IEEE Trans. Neural Netw. Learn. Syst. (2021)
[30] Lin, M.; Zhao, B.; Liu, D., Policy Gradient Adaptive Critic Designs for Model Free Optimal Tracking Control With Experience Replay, IEEE Trans. Syst. Man Cybern. Syst. (2021)
[31] T. P. Lillicrap et al., Continuous control with deep reinforcement learning, Proc. Int. Conf. Learn. Represent. 2016.
[32] Abu-Khalaf, M.; Lewis, F. L., Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach, Automatica, 41, 5, 779-791 (2005) · Zbl 1087.49022
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.