×

A novel optimal bipartite consensus control scheme for unknown multi-agent systems via model-free reinforcement learning. (English) Zbl 1433.93008

Summary: In this paper, the optimal bipartite consensus control (OBCC) problem is investigated for unknown multi-agent systems (MASs) with coopetition networks. A novel distributed OBCC scheme is proposed based on model-free reinforcement learning method to achieve OBCC, where the agent’s dynamics are no longer required. First, The coopetition networks are applied to establish the cooperative and competitive interactions among agents, and then the OBCC problem is formulated by introducing local neighbor bipartite consensus errors and performance index functions (PIFs) for each agent. Second, in order to obtain the OBCC laws, a policy iteration algorithm (PIA) is employed to learn the solutions to discrete-time (DT) Hamilton-Jacobi-Bellman (HJB) equations. Third, to implement the proposed methods, we adopt a data-driven actor-critic-based neural networks (NNs) framework to approximate the control laws and the PIFs, respectively, in an online learning manner. Finally, some simulation results are given to demonstrate the effectiveness of the developed approaches.

MSC:

93A14 Decentralized systems
05C82 Small world graphs, complex networks (graph-theoretic aspects)
49L20 Dynamic programming in optimal control and differential games
93C55 Discrete-time control/observation systems
93C15 Control/observation systems governed by ordinary differential equations
Full Text: DOI

References:

[1] Anderson, B. D.O.; Fidan, B.; Yu, C.; Walle, D., UAV Formation control: Theory and application (2008), Springer: Springer London · Zbl 1201.93089
[2] Wen, G.; Yu, X.; Liu, Z.; Yu, W., Adaptive consensus-based robust strategy for economic dispatch of smart grids subject to communication uncertainties, IEEE Trans. Ind. Informat., 14, 6, 2484-2496 (2018)
[3] Dong, X.; Yu, B.; Shi, Z.; Zhong, Y., Time-varying formation control for unmanned aerial vehicles: theories and applications, IEEE Trans. Control Syst. Technol., 23, 1, 340-348 (2015)
[4] Wang, P. K.C., Navigation strategies for multiple autonomous mobile robots moving in formation, J. Robotic Syst., 8, 2, 177-195 (2010) · Zbl 0716.70035
[5] Liu, X.; Zhang, K.; Xie, W., Pinning impulsive synchronization of reaction-diffusion neural networks with time-varying delays, IEEE Trans. Neural Netw. Learn. Syst., 28, 5, 1055-1067 (2017)
[6] Shen, H.; Zhu, Y.; Zhang, L.; Park, J. H., Extended dissipative state estimation for Markov jump neural networks with unreliable links, IEEE Trans. Neural Netw. Learn. Syst., 28, 346-358 (2017)
[7] Yu, Z.; Jiang, H.; Mei, X.; Hu, C., Guaranteed cost consensus for second-order multi-agent systems with heterogeneous inertias, Appl. Math. Comput., 338, 739-757 (2018) · Zbl 1427.93034
[8] Tao, J.; Wu, Z. G.; Su, H. Y.; Wu, Y. Q.; Zhang, D., Asynchronous and resilient filtering for Markovian jump neural networks subject to extended dissipativity, IEEE Trans. Cyber., 49, 7, 2504-2513 (2019)
[9] Shi, K.; Wang, J.; Zhong, S.; Zhang, X.; Liu, Y.; Cheng, J., New reliable nonuniform sampling control for uncertain chaotic neural networks under Markov switching topologies, Appl. Math. Comput., 347, 169-193 (2019) · Zbl 1428.92015
[10] Olfatisaber, R.; Murray, R. M., Consensus problems in networks of agents with switching topology and time-delays, IEEE Trans. Autom. Control, 49, 9, 1520-1533 (2004) · Zbl 1365.93301
[11] Ren, W.; Beard, R. W., Consensus seeking in multiagent systems under dynamically changing interaction topologies, IEEE Trans. Autom. Control, 50, 5, 655-661 (2005) · Zbl 1365.93302
[12] Hong, Y.; Hu, J.; Gao, L., Tracking control for multi-agent consensus with an active leader and variable topology, Automatica, 42, 7, 1177-1182 (2006) · Zbl 1117.93300
[13] Shi, G.; Hong, Y., Global target aggregation and state agreement of nonlinear multi-agent systems with switching topologies, Automatica, 45, 5, 1165-1175 (2009) · Zbl 1162.93308
[14] Hu, J.; Hong, Y., Leader-following coordination of multi-agent systems with coupling time delays, Phys. A Stat. Mech. Appl., 374, 2, 853-863 (2007)
[15] Ye, D.; Chen, M.; Yang, H., Distributed adaptive event-triggered fault-tolerant consensus of multiagent systems with general linear dynamics, IEEE Trans. Cybern., 49, 3, 757-767 (2019)
[16] Ye, D.; Yang, X.; Su, L., Fault-tolerant synchronization control for complex dynamical networks with semi-Markov jump topology, Appl. Math. Comput., 312, 36-48 (2017) · Zbl 1426.93362
[17] Riker, W. H., The Theory of Political Coalitions (1962), Yale University Press: Yale University Press New Haven, Conn, USA
[18] Ware, A., The Dynamics of Two-Party Politics: Party Structures and the Management of Competition, Comparative Politics (2009), Oxford University Press: Oxford University Press Oxford, UK
[19] Altafini, C., Consensus problems on networks with antagonistic interactions, IEEE Trans. Autom. Control, 58, 935-946 (2013) · Zbl 1369.93433
[20] Hu, J.; Zhu, H., Adaptive bipartite consensus on coopetition networks, Phys. D, 307, 14-21 (2015) · Zbl 1364.91130
[21] Meng, D.; Du, M.; Jia, Y., Interval bipartite consensus of networked agents associated with signed digraphs, IEEE Trans. Autom. Control, 61, 12, 3755-3770 (2016) · Zbl 1359.93195
[22] Ma, H.; Liu, D.; Wang, D.; Lou, B., Bipartite output consensus in networked multi-agent systems of high-order power integrators with signed digraph and input noises, Int. J. Syst. Science, 47, 13, 3116-3131 (2016) · Zbl 1346.93030
[23] Wu, Y.; Liu, L.; Hu, J.; Feng, G., Adaptive antisynchronization of multi-layer reactioncdiffusion neural networks, IEEE Trans. Neural Netw. and Learn. Syst., 29, 4, 807-818 (2018)
[24] Hu, J.; Wu, Y.; Liu, L.; Feng, G., Adaptive bipartite consensus control of high-order multiagent systems on coopetition networks, Int. J. Robust Nonlin. Control, 28, 7, 2868-2886 (2018) · Zbl 1391.93010
[25] Hu, J.; Zheng, W., Emergent collective behaviors on coopetition networks, Phys. Lett. A, 378, 26-27, 1787-1796 (2014) · Zbl 1342.37082
[26] Shamshirband, S.; Patel, A.; Anuar, N. B.; Kiah, M. L.M.; Abraham, A., Cooperative game theoretic approach using fuzzy q-learning for detecting and preventing intrusions in wireless sensor networks, Eng. Appl. Artif. Intell., 32, 228-241 (2014)
[27] Kalantari, A.; Kamsin, A.; Shamshirband, S.; Gani, A.; Alinejad-Rokny, H.; Chronopoulos, A. T., Computational intelligence approaches for classification of medical data: state-of-the-art, future challenges and research directions, Neurocomputing, 276, 2-22 (2018)
[28] Fotovatikhah, F.; Herrera, M.; Shamshirband, S.; Chau, K. W.; Ardabili, S. F.; Piran, M. J., Survey of computational intelligence as basis to big flood management: challenges, research directions and future work, Eng. Appl. Comp. Fluid Mech., 12, 1, 411-437 (2018)
[29] Abu-Khalaf, M.; Lewis, F. L., Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network hjb approach, Automatica, 41, 5, 779-791 (2005) · Zbl 1087.49022
[30] Bhasin, S.; Kamalapurkar, R.; Johnson, M.; Vamvoudakis, K. G.; Lewis, F. L.; Dixon, W. E., A novel actor-critic-identifier architecture for approximate optimal control of uncertain nonlinear systems, Automatica, 49, 1, 82-92 (2013) · Zbl 1257.93055
[31] Jiang, Z.; Jiang, Y., Robust adaptive dynamic programming for linear and nonlinear systems: an overview, Eur. J. Control, 19, 5, 417-425 (2013) · Zbl 1293.49053
[32] Vamvoudakis, K. G.; Ferraz, H., Model-free event-triggered control algorithm for continuous-time linear systems with optimal performance, Automatica, 87, 412-420 (2018) · Zbl 1378.93083
[33] Liu, D.; Wei, Q., Policy iteration adaptive dynamic programming algorithm for discrete-time nonlinear systems, IEEE Trans. Neural Netw. Learn. Syst., 25, 3, 621-634 (2014)
[34] Al-Tamimi, A.; Lewis, F. L.; Abu-Khalaf, M., Discrete-time nonlinear hjb solution using approximate dynamic programming: convergence proof, IEEE Trans. Syst. Man Cyber. Part B Cyber., 38, 4, 943-949 (2008)
[35] Zhang, H.; Wei, Q.; Luo, Y., A novel infinite-time optimal tracking control scheme for a class of discrete-time nonlinear systems via the greedy hdp iteration algorithm, IEEE Trans. Syst. Man Cyber. Part B Cyber., 38, 4, 937-942 (2008)
[36] Wang, D.; Liu, D.; Wei, Q.; Zhao, D.; Jin, N., Optimal control of unknown nonaffine nonlinear discrete-time systems based on adaptive dynamic programming, Automatica, 48, 8, 1825-1832 (2012) · Zbl 1269.49042
[37] Murray, J. J.; Cox, C. J.; Lendaris, G. G.; Saeks, R., Adaptive dynamic programming, IEEE Trans. Syst. Man Cyber. Part C: Appl. Rev., 32, 2, 140-153 (2002)
[38] Abouheaf, M. I.; Lewis, F. L.; Vamvoudakis, K. G.; Haesaert, S.; Babuska, R., Multi-agent discrete-time graphical games and reinforcement learning solutions, Automatica, 50, 12, 3038-3053 (2014) · Zbl 1367.91032
[39] Zhang, H.; Jiang, H.; Luo, Y.; Xiao, G., Data-driven optimal consensus control for discrete-time multi-agent systems with unknown dynamics using reinforcement learning method, IEEE Trans. Ind. Electron., 64, 5, 4091-4100 (2017)
[40] Peng, Z.; Zhao, Y.; Hu, J.; Ghosh, B. K., Data-driven optimal tracking control of discrete-time multi-agent systems with two-stage policy iteration algorithm, Inf. Sci., 481, 189-202 (2019) · Zbl 1451.93228
[41] Zhong, X.; He, H., Grhdp solution for optimal consensus control of multiagent discrete-time systems, IEEE Trans. Syst. Man Cyber. Syst. (2018)
[42] Bu, X.; Hou, Z.; Zhang, H., Data-driven multiagent systems consensus tracking using model free adaptive control, IEEE Trans. Neural Netw. Learn. Syst., 29, 5, 1514-1524 (2018)
[43] Zhang, H.; Yue, D.; Dou, C.; Zhao, W.; Xie, X., Data-driven distributed optimal consensus control for unknown multiagent systems with input-delay, IEEE Trans. Cyber., 49, 6, 2095-2105 (2019)
[44] Li, J.; Modares, H.; Chai, T.; Lewis, F. L.; Xie, L., Off-policy reinforcement learning for synchronization in multiagent graphical games, IEEE Trans. Neural Netw. Learn. Syst., 28, 10, 2434-2445 (2017)
[45] Peng, Z.; Hu, J.; Ghosh, B. K., Data-driven containment control of discrete-time multi-agent systems via value iteration, Sci. China Inf. Sci. (2018)
[46] Peng, Z.; Zhang, J.; Hu, J.; Huang, R.; Ghosh, B. K., Optimal containment control of continuous-time multi-agent systems with unknown disturbances using data-driven approach, Sci. China Inf. Sci. (2019)
[47] Wu, Y.; Hu, J.; Zhang, Y.; Zeng, Y., Interventional consensus for high-order multi-agent systems with unknown disturbances on coopetition networks, Neurocomputing, 194, 126-134 (2016)
[48] Lewis, F. L.; Liu, D., Reinforcement Learning and Approximate Dynamic Programming for feedback Control (2013), Wiley: Wiley New York, NY, USA
[49] Si, J.; Wang, Y. T., Online learning control by association and reinforcement, IEEE Trans. Neural Netw., 12, 2, 264-276 (2001)
[50] Chang, X. H.; Yang, G. H., Nonfragile \(h_∞\) filter design for TS fuzzy systems in standard form, IEEE Trans. Ind. Electron., 7, 61, 3448-3458 (2014)
[51] Chang, X. H.; Huang, R.; Park, J. H., Obust guaranteed cost control under uigital communication channels, IEEE Trans. Ind. Inf. (2019)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.