Document Zbl 1533.93040

Wang, Yan; Xue, Huiwen; Wen, Jiwei; Liu, Jinfeng; Luan, Xiaoli

Efficient off-policy Q-learning for multi-agent systems by solving dual games. (English) Zbl 1533.93040

Int. J. Robust Nonlinear Control 34, No. 6, 4193-4212 (2024).

Summary: This article develops distributed optimal control policies via Q-learning for multi-agent systems (MASs) by solving dual games. According to game theory, first, the distributed consensus problem is formulated as a multi-player non-zero-sum game, where each agent is viewed as a player focusing only on its local performance and the whole MAS achieves Nash equilibrium. Second, for each agent, the anti-disturbance problem is formulated as a two-player zero-sum game, in which the control input and external disturbance are a pair of opponents. Specifically, (1) an offline data-driven off-policy for distributed tracking algorithm based on momentum policy gradient (MPG) is developed, which can effectively achieve consensus of MASs with guaranteed \(l_2\)-bounded synchronization error. (2) An actor-critic-disturbance neural network is employed to implement the MPG algorithm and obtain optimal policies. Finally, numerical and practical simulation results are conducted to verify the effectiveness of the developed tracking policies via MPG algorithm.
© 2024 John Wiley & Sons Ltd.

MSC:

93A16	Multi-agent systems
91A05	2-person games
91A10	Noncooperative games
91A80	Applications of game theory

Keywords:

dual games; momentum policy gradient; multi-agent systems; off-policy

Cite Review PDF

Full Text: DOI

References:

[1]	WangY, NguyenT, XuY, TranQ, CaireR. Peer‐to‐peer control for networked microgrids: multi‐layer and multi‐agent architecture design. IEEE Trans Smart Grid. 2020;11(6):4688‐4699.
[2]	LiC, JiaX, ZhouY, LiX. A microgrids energy management model based on multi‐agent system using adaptive weight and chaotic search particle swarm optimization considering demand response. J Clean Prod. 2020;262:121247.
[3]	RezaeeH, AbdollahiF. Motion synchronization in unmanned aircrafts formation control with communication delays. Commun Nonlinear Sci Numer Simul. 2013;18(3):744‐756. · Zbl 1277.93006
[4]	WuF, ZhangH, WuJ, SongL. Cellular UAV‐to‐device communications: trajectory design and mode selection by multi‐agent deep reinforcement learning. IEEE Trans Commun. 2020;68(7):4175‐4189.
[5]	GaoW, JiangZ, OzbayK. Data‐driven adaptive optimal control of connected vehicles. IEEE Trans Intell Transp Syst. 2017;18(5):1122‐1133.
[6]	González‐SierraJ, Flores‐MontesD, Hernandez‐MartinezE, Fernández‐AnayaG, Paniagua‐ControP. Robust circumnavigation of a heterogeneous multi‐agent system. Auton Robot. 2021;45(2):265‐281.
[7]	HuW, LiuL, FengG. Consensus of linear multi‐agent systems by distributed event‐triggered strategy. IEEE Trans Cybern. 2016;46(1):148‐157.
[8]	LiZ, IshiguroH. Consensus of linear multi‐agent systems based on full‐order observer. J Franklin Inst. 2014;351(2):1151‐1160. · Zbl 1293.93048
[9]	WangL, FengW, ChenM, WangQ. Consensus of nonlinear multi‐agent systems with adaptive protocols. IET Control Theory Appl. 2014;8(18):2245‐2252.
[10]	FengT, ZhangJ, TongY, ZhangH. Q‐learning algorithm in solving consensusability problem of discrete‐time multi‐agent systems. Automatica. 2021;128:109576. · Zbl 1461.93466
[11]	WangZ, WangX, TangY, LiuY, HuJ. Optimal tracking control of a nonlinear multiagent system using Q‐learning via event‐triggered reinforcement learning. Entropy. 2023;25(2):1‐22.
[12]	JiangY, FanJ, GaoW, ChaiT, LewisF. Cooperative adaptive optimal output regulation of nonlinear discrete‐time multi‐agent systems. Automatica. 2020;121:109149. · Zbl 1448.93159
[13]	PengZ, ZhaoY, HuJ, GhoshB. Data‐driven optimal tracking control of discrete‐time multi‐agent systems with two‐stage policy iteration algorithm. Inform Sci. 2019;481:189‐202. · Zbl 1451.93228
[14]	LiJ, JiL, LiH. Optimal consensus control for unknown second‐order multi‐agent systems: using model‐free reinforcement learning method. Appl Math Comput. 2021;410:126451. · Zbl 1510.93020
[15]	MuC, ZhaoQ, SunC. Optimal model‐free output synchronization of heterogeneous multiagent systems under switching topologies. Sci China Inf Sci. 2020;67(12):10951‐10964.
[16]	LiP, ZouW, GuoJ, XiangZ. Optimal consensus of a class of discrete‐time linear multi‐agent systems via value iteration with guaranteed admissibility. Neurocomputing. 2023;516:1‐10.
[17]	HaoL, WangC, ZhangG, JingC, ShiY. Data‐driven optimal tracking control of discrete‐time linear systems with multiple delays via the value iteration algorithm. Int J Syst Sci. 2022;53(14):2845‐2859. · Zbl 1504.93217
[18]	YangX, ZhangH, WangZ. Data‐based optimal consensus control for multiagent systems with policy gradient reinforcement learning. IEEE Trans Neural Netw Learn Syst. 2022;33(8):3872‐3883.
[19]	ZhangC, JiL, YangS, LiH. Optimal antisynchronization control for unknown multiagent systems with deep deterministic policy gradient approach. Inform Sci. 2023;622:946‐961. · Zbl 1536.93055
[20]	JiaoQ, ModaresH, XuS, LewisF, VamvoudakisK. Multi‐agent zero‐sum differential graphical games for disturbance rejection in distributed control. Automatica. 2016;69:24‐34. · Zbl 1338.93023
[21]	YuD, GeS, LiD, WangP. Finite‐horizon robust formation‐containment control of multi‐agent networks with unknown dynamics. Neurocomputing. 2021;458:403‐415.
[22]	XiaoJ, YangN. Zero‐sum differential graphical game: optimal synchronization for heterogeneous multi‐agent system with additional antagonistic input. 2017 Chinese Automation Congress (CAC). IEEE; 2017:2071‐2076.
[23]	YangN, XiaoJ, XiaoL, WangY. Non‐zero sum differential graphical game: cluster synchronisation for multi‐agents with partially unknown dynamics. Int J Control. 2019;92(10):2408‐2419. · Zbl 1423.93358
[24]	ZhangH, JiangH, LuoY, XiaoG. Data‐driven optimal consensus control for discrete‐time multi‐agent systems with unknown dynamics using reinforcement learning method. IEEE Trans Ind Electron. 2017;64(5):4091‐4100.
[25]	MuC, ZhaoQ, GaoZ, SunC. Q‐learning solution for optimal consensus control of discrete‐time multiagent systems using reinforcement learning. J Franklin Inst. 2019;356(13):6946‐6967. · Zbl 1418.93250
[26]	ChenC, LewisF, XieK, XieS, LiuY. Off‐policy learning for adaptive optimal output synchronization of heterogeneous multi‐agent systems. Automatica. 2020;119:109081. · Zbl 1451.93012
[27]	LiJ, WangJ. Reinforcement learning based proportional-integral-derivative controllers design for consensus of multi‐agent systems. ISA Trans. 2023;132:377‐386.
[28]	YanB, ShiP, LimC, ShiZ. Optimal robust formation control for heterogeneous multi‐agent systems based on reinforcement learning. Int J Robust Nonlinear Control. 2022;32(5):2683‐2704. · Zbl 1527.93075
[29]	PengX, HeY. Aperiodic sampled‐data consensus control for homogeneous and heterogeneous multi‐agent systems: a looped‐functional method. Int J Robust Nonlinear Control. 2023;33(13):8067‐8086. · Zbl 1534.93292
[30]	YangX, PanX, SunJ, TanL. Optimized adaptive event‐triggered tracking control for multi‐agent systems with full‐state constraints. Int J Robust Nonlinear Control. 2022;32(18):10101‐10124. · Zbl 1529.93062
[31]	OkeP, NguangS, WenJ. H_∞ dynamic output feedback control for independently driven four‐wheel electric vehicles with differential speed steering. 2016 IEEE International Conference on Information and Automation (ICIA). IEEE; 2016:567‐572.

This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.