×

Neighbor Q-learning based consensus control for discrete-time multi-agent systems. (English) Zbl 1531.93396

Summary: The neighbor Q-learning based consensus control algorithm is developed for discrete-time multi-agent systems in this article. To realize the proposed algorithm, a new actor-critic architecture is employed for each agent. The critic network of each agent approximates its Q-function while the actor network produces control signal by minimizing the Q-function. Considering the distribution metrics of the systems, the neighbors’ Q-functions of each agent are applied to the update procedure of the critic network to stabilize the learning process and avoid the overestimation problem. The convergence properties and stability analysis for the proposed algorithm are provided. Different discount factors corresponding to various topology structures are discussed in the convergence analysis section. The accurate system model is nonessential for the algorithm which is too intricate to build up for practical systems. Finally, three simulation examples including different discount factors are conducted to demonstrate the effectiveness of the consensus control algorithm.
{© 2022 John Wiley & Sons Ltd.}

MSC:

93D50 Consensus
93C55 Discrete-time control/observation systems
93A16 Multi-agent systems
Full Text: DOI

References:

[1] LinZ, LiuHHT. Topology‐based distributed optimization for multi‐UAV cooperative wildfire monitoring. Opt Control Appl Methods. 2018;39(4):1530‐1548. · Zbl 1398.93019
[2] GaoY, WangW, YuN. Consensus multi‐agent reinforcement learning for Volt‐VAR control in power distribution networks. IEEE Trans Smart Grid. 2021.
[3] DerakhshanF, YousefiS. A review on the applications of multiagent systems in wireless sensor networks. Int J Distrib Sens Netw. 2019;15(5):1550147719850767.
[4] LiX, TangY, KarimiHR. Consensus of multi‐agent systems via fully distributed event‐triggered control. Automatica. 2020;116:108898. · Zbl 1440.93235
[5] DuH, WenG, WuD, ChengY, LüJ. Distributed fixed‐time consensus for nonlinear heterogeneous multi‐agent systems. Automatica. 2020;113:108797. · Zbl 1440.93232
[6] GhasemiA, MenhajMB, AskariJ, EbrahimiDM. Optimal topology design and distributed formation control of multiagent systems with complex‐weights directed networks. Opt Control Appl Methods. 2020;41(5):1632‐1643. · Zbl 1469.93003
[7] ShanQ, TengF, LiT, ChenCL. Containment control of multi‐agent systems with nonvanishing disturbance via topology reconfiguration. Sci China Inf Sci. 2021;64(7):1‐3.
[8] LiuT, QiJ, JiangZ‐P. Distributed containment control of multi‐agent systems with velocity and acceleration saturation. Automatica. 2020;117:108992. · Zbl 1442.93004
[9] ShiZ, ZhouC. Randomized optimal consensus of multiagent systems based on a novel intermittent projected subgradient algorithm. Opt Control Appl Methods. 2020;41(5):1783‐1795. · Zbl 1469.93122
[10] SunJ, LiJ. Distributed optimal consensus control for nonlinear multiagent systems based on DISOPE algorithm. Opt Control Appl Methods. 2019;40(3):517‐528. · Zbl 1425.93021
[11] ZhuW, JiangZ‐P, FengG. Event‐based consensus of multi‐agent systems with general linear models. Automatica. 2014;50(2):552‐558. · Zbl 1364.93489
[12] WielandP, KimJ‐S, ScheuH, AllgöwerF. On consensus in multi‐agent systems with linear high‐order agents. IFAC Proc Vol. 2008;41(2):1541‐1546.
[13] HuH, YoonSY, LinZ. Consensus of multi‐agent systems with control‐affine nonlinear dynamics. Unmanned Syst. 2016;4(01):61‐73.
[14] LiuL, SunH, MaL, ZhangJ, BoY. Quasi‐consensus control for a class of time‐varying stochastic nonlinear time‐delay multiagent systems subject to deception attacks. IEEE Trans Syst Man Cybern Syst. 2020.
[15] LiuX‐K, ZhangJ‐F, WangJ. Differentially private consensus algorithm for continuous‐time heterogeneous multi‐agent systems. Automatica. 2020;122:109283. · Zbl 1453.93223
[16] SuttonRS, BartoAG. Reinforcement Learning: An Introduction. MIT Press; 2018. · Zbl 1407.68009
[17] WangZ, LiuY, MaZ, LiuX, MaJ. LiPSG: lightweight privacy‐preserving Q‐learning‐based energy management for the IoT‐enabled smart grid. IEEE Internet Things J. 2020;7(5):3935‐3947.
[18] WangY, SunJ, HeH, SunC. Deterministic policy gradient with integral compensator for robust quadrotor control. IEEE Trans Syst Man Cybern Syst. 2019;50(10):3713‐3725.
[19] SuttonRS. Generalization in reinforcement learning: successful examples using sparse coarse coding. Adv Neural Inf Proces Syst. 1996;8:1038‐1044.
[20] WatkinsCJCH, DayanP. Q‐learning. Mach Learn. 1992;8(3‐4):279‐292. · Zbl 0773.68062
[21] WangF‐Y, ZhangH, DerongL. Adaptive dynamic programming: an introduction. IEEE Comput Intell Mag. 2009;4(2):39‐47.
[22] WangD, LiuD, MuC, ZhangY. Neural network learning and robust stabilization of nonlinear systems with dynamic uncertainties. IEEE Trans Neural Netw Learn Syst. 2018;29(4):1342‐1351.
[23] MuC, NiZ, SunC, HeH. Air‐breathing hypersonic vehicle tracking control based on adaptive dynamic programming. IEEE Trans Neural Netw Learn Syst. 2017;28(3):584‐598.
[24] ZhangH, XiaoG, LiuY, LiuL. Value iteration‐based \(H_{\operatorname{\infty}}\) controller design for continuous‐time nonlinear systems subject to input constraints. IEEE Trans Syst Man Cybern Syst. 2018;50(11):3986‐3995.
[25] XiaoG, ZhangH, ZhangK, WenY. Value iteration based integral reinforcement learning approach for \(H_{\operatorname{\infty}}\) controller design of continuous‐time nonlinear systems. Neurocomputing. 2018;285:51‐59.
[26] VamvoudakisKG, LewisFL, HudasGR. Multi‐agent differential graphical games: online adaptive learning solution for synchronization with optimality. Automatica. 2012;48(8):1598‐1611. · Zbl 1267.93190
[27] ZhangH, ZhangJ, YangG‐H, LuoY. Leader‐based optimal coordination control for the consensus problem of multiagent differential games via fuzzy adaptive dynamic programming. IEEE Trans Fuzzy Syst. 2015;23(1):152‐163.
[28] WeiQ, LiuD, LewisFL. Optimal distributed synchronization control for continuous‐time heterogeneous multi‐agent differential graphical games. Inf Sci. 2015;317:96‐113. · Zbl 1386.93023
[29] PengZ, ZhaoY, HuJ, GhoshBK. Data‐driven optimal tracking control of discrete‐time multi‐agent systems with two‐stage policy iteration algorithm. Inf Sci. 2019;481:189‐202. · Zbl 1451.93228
[30] ZhangH, JiangH, LuoY, XiaoG. Data‐driven optimal consensus control for discrete‐time multi‐agent systems with unknown dynamics using reinforcement learning method. IEEE Trans Ind Electron. 2017;64(5):4091‐4100.
[31] WenY, ZhangH, SuH, RenH. Optimal tracking control for non‐zero‐sum games of linear discrete‐time systems via off‐policy reinforcement learning. Opt Control Appl Methods. 2020;41(4):1233‐1250. · Zbl 1469.91008
[32] WangH, LiM. Model‐free reinforcement learning for fully cooperative consensus problem of nonlinear multiagent systems. IEEE Trans Neural Netw Learn Syst. 2020.
[33] YangX, ZhangH, WangZ. Data‐based optimal consensus control for multiagent systems with policy gradient reinforcement learning. IEEE Trans Neural Netw Learn Syst. 2021.
[34] SuH, ZhangH, ZhangK, GaoW. Online reinforcement learning for a class of partially unknown continuous‐time nonlinear systems via value iteration. Opt Control Appl Methods. 2018;39(2):1011‐1028. · Zbl 1391.93133
[35] LongM, SuH, ZengZ. Output‐feedback global consensus of discrete‐time multiagent systems subject to input saturation via Q‐learning method. IEEE Trans Cybern. 2020.
[36] LoweR, WuY, TamarA, HarbJ, AbbeelP, MordatchI. Multi‐agent actor‐critic for mixed cooperative‐competitive environments. Adv Neural Inf Proces Syst. 2017;30:6379‐6390.
[37] vanHasseltH. Double Q‐learning. Proceedings of the International Conference on Neural Information Processing Systems; 2010:2613‐2621.
[38] YuanX, DongL, SunC. Solver‐critic: a reinforcement learning method for discrete‐time‐constrained‐input systems. IEEE Trans Cybern. 2020.
[39] AbouheafMI, LewisFL, VamvoudakisKG, HaesaertS, BabuskaR. Multi‐agent discrete‐time graphical games and reinforcement learning solutions. Automatica. 2014;50(12):3038‐3053. · Zbl 1367.91032
[40] EngelJM, Babus̆kaR. On‐line reinforcement learning for nonlinear motion control: quadratic and non‐quadratic reward functions. IFAC Proc Vol. 2014;47(3):7043‐7048.
[41] ZhangH, ParkJH, YueD, XieX. Finite‐Horizon optimal consensus control for unknown multiagent state‐delay systems. IEEE Trans Cybern. 2020;50(2):402‐413.
[42] DierksT, JagannathanS. Online optimal control of affine nonlinear discrete‐time systems with unknown internal dynamics by using time‐based policy update. IEEE Trans Neural Netw Learn Syst. 2012;23(7):1118‐1129.
[43] LopezMVG, LewisFL. Dynamic multiobjective control for continuous‐time systems using reinforcement learning. IEEE Trans Automat Contr. 2019;64(7):2869‐2874. · Zbl 1482.93205
[44] CiarletPG. Linear and nonlinear functional analysis with applications. Soc Ind Appl Math. 2013;130:152. · Zbl 1293.46001
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.