Document Zbl 1537.93669

Model-free distributed optimal control for general discrete-time linear systems using reinforcement learning. (English) Zbl 1537.93669

Int. J. Robust Nonlinear Control 34, No. 9, 5570-5589 (2024).

Summary: This article proposes a novel data-driven framework of distributed optimal consensus for discrete-time linear multi-agent systems under general digraphs. A fully distributed control protocol is proposed by using linear quadratic regulator approach, which is proved to be a sufficient and necessary condition for optimal control of multi-agent systems through dynamic programming and minimum principle. Moreover, the control protocol can be constructed by using local information with the aid of the solution of the algebraic Riccati equation (ARE). Based on the Q-learning method, a reinforcement learning framework is presented to find the solution of the ARE in a data-driven way, in which we only need to collect information from an arbitrary follower to learn the feedback gain matrix. Thus, the multi-agent system can achieve distributed optimal consensus when system dynamics and global information are completely unavailable. For output feedback cases, accurate state information estimation is established such that optimal consensus control is realized. Moreover, the data-driven optimal consensus method designed in this article is applicable to general digraph that contains a directed spanning tree. Finally, numerical simulations verify the validity of the proposed optimal control protocols and data-driven framework.
© 2024 John Wiley & Sons Ltd.

MSC:

93D50	Consensus
49L20	Dynamic programming in optimal control and differential games
93C55	Discrete-time control/observation systems
93A16	Multi-agent systems
93C05	Linear systems in control theory

Keywords:

discrete systems; LQR; model-free; optimal control; Q-learning

Cite Review PDF

Full Text: DOI

References:

[1]	FerberJ, WeissG. Multi‐Agent Systems: an Introduction to Distributed Artificial Intelligence. Addison‐Wesley; 1999.
[2]	MüllerJP, FischerK. Application impact of multi‐agent systems and technologies: a survey. In: ShehoryO (ed.), SturmA (ed.), eds. Agent‐Oriented Software Engineering. Springer; 2014:27‐53.
[3]	XiaoF, WangL, ChenJ, et al. Finite‐time formation control for multi‐agent systems. Automatica. 2009;45(11):2605‐2611. · Zbl 1180.93006
[4]	ZhangH, YueD, YinX, et al. Finite‐time distributed event‐triggered consensus control for multi‐agent systems. Inform Sci. 2016;339:132‐142. · Zbl 1395.68268
[5]	WangZ, ZhangH, SongX, et al. Consensus problems for discrete‐time agents with communication delay. Int J Control Autom Syst. 2017;15(4):1515‐1523.
[6]	MovricKH, LewisFL. Cooperative optimal control for multi‐agent systems on directed graph topologies. IEEE Trans Automat Contr. 2013;59(3):769‐774. · Zbl 1360.49026
[7]	ZhangH, HuX. Consensus control for linear systems with optimal energy cost. Automatica. 2018;93:83‐91. · Zbl 1400.93020
[8]	QinJ, LiM, ShiY, et al. Optimal synchronization control of multiagent systems with input saturation via off‐policy reinforcement learning. IEEE Trans Neural Netw Learn Syst. 2018;30(1):85‐96.
[9]	WangQ, DuanZ, LvY, et al. Linear quadratic optimal consensus of discrete‐time multi‐agent systems with optimal steady state: a distributed model predictive control approach. Automatica. 2021;127:109505. · Zbl 1461.93138
[10]	ZhangZ, YanW, LiH. Distributed optimal control for linear multiagent systems on general digraphs. IEEE Trans Automat Contr. 2020;66(1):322‐328. · Zbl 1536.93831
[11]	WangQ, DuanZ, WangJ. Distributed optimal consensus control algorithm for continuous‐time multi‐agent systems. IEEE Trans Circuits Syst II Express Briefs. 2019;67(1):102‐106.
[12]	SiJ, WangYT. Online learning control by association and reinforcement. IEEE Trans Neural Netw. 2001;12(2):264‐276.
[13]	RizviSAA, LinZ. An iterative Q‐learning scheme for the global stabilization of discrete‐time linear systems subject to actuator saturation. Int J Robust Nonlinear Control. 2019;29(9):2660‐2672. · Zbl 1418.93223
[14]	LongM, SuH, ZengZ. Model‐free algorithms for containment control of saturated discrete‐time multiagent systems via Q‐learning method. IEEE Trans Syst Man Cybern Syst. 2020;52:1308‐1316.
[15]	RizviSAA, LinZ. Output feedback reinforcement learning based optimal output synchronisation of heterogeneous discrete‐time multi‐agent systems. IET Control Theory Appl. 2019;13(17):2866‐2876.
[16]	ChiR, HuiY, HuangB, HouZ, BuX. Data‐driven adaptive consensus learning from network topologies. IEEE Trans Neural Netw Learn Syst. 2021;33(8):3487‐3497.
[17]	LiJ, ModaresH, ChaiT, et al. Off‐policy reinforcement learning for synchronization in multiagent graphical games. IEEE Trans Neural Netw Learn Syst. 2017;28(10):2434‐2445.
[18]	ZhangH, FengT, LiangH, et al. LQR‐based optimal distributed cooperative design for linear discrete‐time multiagent systems. IEEE Trans Neural Netw Learn Syst. 2015;28(3):599‐611.
[19]	ZhangJ, ZhangH, FengT. Distributed optimal consensus control for nonlinear multiagent system with unknown dynamic. IEEE Trans Neural Netw Learn Syst. 2017;29(8):3339‐3348.
[20]	YangX, ZhangH, WangZ. Data‐based optimal consensus control for multiagent systems with policy gradient reinforcement learning. IEEE Trans Neural Netw Learn Syst. 2021;33(8):3872‐3883.
[21]	MuC, ZhaoQ, GaoZ, et al. Q‐learning solution for optimal consensus control of discrete‐time multiagent systems using reinforcement learning. J Franklin Inst. 2019;356(13):6946‐6967. · Zbl 1418.93250
[22]	ZhangH, JiangH, LuoY, et al. Data‐driven optimal consensus control for discrete‐time multi‐agent systems with unknown dynamics using reinforcement learning method. IEEE Trans Ind Electron. 2016;64(5):4091‐4100.
[23]	LvY, LiZ, DuanZ, et al. Novel distributed robust adaptive consensus protocols for linear multi‐agent systems with directed graphs and external disturbances. Int J Control. 2017;90(2):137‐147. · Zbl 1359.93022
[24]	FengX, ZhaoZ. Model‐free distributed optimal control for continuous‐time linear systems. IET Control Theory Appl. 2022;16(16):1685‐1695.
[25]	MontaguePR. Reinforcement learning: an introduction, by Sutton, R.S. and Barto, A.G. Trends Cogn Sci. 1999;3(9):360.
[26]	TaoG. Adaptive Control Design and Analysis. Wiley; 2003. · Zbl 1061.93004
[27]	BradtkeSJ, YdstieBE, BartoAG. Adaptive linear quadratic control using policy iteration. Proceedings of 1994 American Control Conference‐ACC’94. Vol 3. IEEE; 1994:3475‐3479.
[28]	LandeliusT. Reinforcement Learning and Distributed Local Model Synthesis. Linköping University Electronic Press; 1997.
[29]	WangH, LiM. Model‐free reinforcement learning for fully cooperative consensus problem of nonlinear multiagent systems. IEEE Trans Neural Netw Learn Syst. 2020;33(4):1482‐1491.
[30]	LewisFL, VamvoudakisKG. Reinforcement learning for partially observable dynamic processes: adaptive dynamic programming using measured output data. IEEE Trans Syst Man Cybern B Cybern. 2010;41(1):14‐25.
[31]	LiuY, WangZ. Optimal output synchronization of heterogeneous multi‐agent systems using measured input‐output data. Inform Sci. 2022;582:462‐479. · Zbl 1530.93493
[32]	XiaL, LiQ, SongR, et al. Optimal synchronization control of heterogeneous asymmetric input‐constrained unknown nonlinear MASs via reinforcement learning. IEEE/CAA J Autom Sin. 2021;9(3):520‐532.

This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.