×

Off-policy integral reinforcement learning-based optimal tracking control for a class of nonzero-sum game systems with unknown dynamics. (English) Zbl 1531.93177

Summary: This article studies the optimal tracking control problem of a class of multi-input nonlinear system with unknown dynamics based on reinforcement learning (RL) and nonzero-sum game theory. First of all, an augmented system composed of the tracking error dynamics and the command generator dynamics is constructed. Then, a tracking coupled Hamilton-Jacobi (HJ) equations associated with discounted cost function is derived, which gives the Nash equilibrium solution. The existence of Nash equilibrium is proved. To approximate the Nash equilibrium solution of tracking coupled HJ equations, we give two model-based policy iteration (PI) algorithms, and analyze their equivalence and convergence. Further, to get rid of the prior knowledge of system dynamics, an off-policy integral reinforcement learning (OP-IRL) algorithm implemented by neural networks (NNs) is proposed. The weights of critic NNs and actor NNs are updated simultaneously by the gradient descent method. The convergence of the NNs weights and the stability of the closed-loop error systems are proved. Finally, numerical simulation results are provided to demonstrate the effectiveness of the proposed OP-IRL method.
{© 2022 John Wiley & Sons Ltd.}

MSC:

93C10 Nonlinear systems in control theory
93C15 Control/observation systems governed by ordinary differential equations
68T05 Learning and adaptive systems in artificial intelligence
91A80 Applications of game theory
Full Text: DOI

References:

[1] SongRZ, WeiQL, LiQ. Adaptive Dynamic Programming: Single and Multiple Controllers. Singapore; 2019. · Zbl 1422.49002
[2] WeiQ, LiH, YangX, HeH. Continuous‐time distributed policy iteration for multicontroller nonlinear systems. IEEE Trans Cybern. 2021;51(5):2372‐2383.
[3] MorrisP. Introduction to Game Theory. Springer; 2012.
[4] FriedmanA. Differential Games. Courrier Corporat; 2013.
[5] LewisFL, VrabieDL, SyrmosVL. Optimal Control. 3rd ed.John Wiley and Sons; 2015.
[6] JohnsonM, KamalapurkarR, BhasinS, DixonWE. Approximate
[( N \]\) ‐player nonzero‐sum game solution for an uncertain continuous nonlinear system. IEEE Trans Neural Netw Learn Syst. 2015;26(8):1645‐1658.
[7] KamalapurkarR, KlotzJR, DixonWE. Concurrent learning‐based approximate feedback‐Nash equilibrium solution of N‐player nonzero‐sum differential games. IEEE/CAA J Automat Sin. 2014;1(3):239‐247.
[8] ZhaoD, ZhangQ, WangD, ZhuY. Experience replay for optimal control of nonzero‐sum game systems with unknown dynamics. IEEE Trans Cybern. 2016;46(3):854‐865.
[9] SongR, WeiQ, ZhangH, LewisFL. Discrete‐time non‐zero‐sum games with completely unknown dynamics. IEEE Trans Cybern. 2021;51(6):2929‐2943.
[10] YangY, ZhangS, DongJ, YinY. Data‐driven nonzero‐sum game for discrete‐time systems using off‐policy reinforcement learning. IEEE Access. 2020;8:14074‐14088.
[11] JiangH, ZhangHG, XieXP, HanJ. Neural‐network‐based learning algorithms for cooperative games of discrete‐time multi‐player systems with control constraints via adaptive dynamic programming. Neurocomputing. 2019;344:13‐19.
[12] SongRZ, WeiQL, SongB. Neural‐network‐based synchronous iteration learning method for multi‐player zero‐sum games. Neurocomputing. 2017;242:73‐82.
[13] StarrAW, HoYC. Nonzero‐sum differential games. J Optim Theory Appl. 1969;3(3):184‐206. · Zbl 0169.12301
[14] ZhangQC, ZhaoDB, ZhuYH. Data‐driven adaptive dynamic programming for continuous‐time fully cooperative games with partially constrained inputs. Neurocomputing. 2017;238:377‐386.
[15] LiuD, LiH, WangD. Online synchronous approximate optimal learning algorithm for multi‐player non‐zero‐sum games with unknown dynamics. IEEE Trans Syst Man Cybern Syst. 2014;44(8):1015‐1027.
[16] MylvaganamT, SassanoM, AstolfiA. Constructive
[( \epsilon \]\)‐Nash equilibria for nonzero‐sum differential games. IEEE Trans Automat Contr. 2015;60(4):950‐965. · Zbl 1360.91031
[17] VamvoudakisKG, ModaresH, KiumarsiB, LewisFL. Game theory‐based control system algorithms with real‐time reinforcement learning: how to solve multiplayer games online. IEEE Control Syst Mag. 2017;37(1):33‐52.
[18] VamvoudakisKG. Non‐zero sum Nash Q‐learning for unknown deterministic continuous‐time linear systems. Automatica. 2015;61:274‐281. · Zbl 1336.91022
[19] SuttonRS, BartoAG. Reinforcement Learning: An Introduction. The MIT Press; 2018. · Zbl 1407.68009
[20] LinQ, WeiQ, ZhaoB. Optimal control for discrete‐time systems with actuator saturation. Opt Control Appl Methods. 2017;38(6):1071‐1080. · Zbl 1386.90164
[21] LiangY, ZhangH, ZhangK, WangR. A novel neural network discrete‐time optimal control design for nonlinear time‐delay systems using adaptive critic designs. Opt Control Appl Methods. 2020;41(3):748‐764. · Zbl 1467.93203
[22] XieY, YanY, ShiZ, WuX, CuiH, ZhangZ. Adaptive optimal tracking control for multi‐joint manipulator on space robot. Opt Control Appl Methods. 2020;41(6):1995‐2007. · Zbl 1469.93065
[23] SuH, ZhangH, ZhangK, GaoW. Online reinforcement learning for a class of partially unknown continuous‐time nonlinear systems via value iteration. Opt Control Appl Methods. 2018;39(2):1011‐1028. · Zbl 1391.93133
[24] YangY, KiumarsiB, ModaresH, XuC. Model‐free‐policy iteration for discrete‐time linear quadratic regulation. IEEE Trans Neural Netw Learn Syst. 2021;1‐15. doi: 10.1109/TNNLS.2021.3098985
[25] GanM, ZhaoJ, ZhangC. Extended adaptive optimal control of linear systems with unknown dynamics using adaptive dynamic programming. Asian J Control. 2021;23(2):1097‐1106. · Zbl 07878873
[26] WeiQ, HanL, ZhangT. Spiking adaptive dynamic programming based on Poisson process for discrete‐time nonlinear systems. IEEE Trans Neural Netw Learn Syst. 2021;33(5):1846‐1856.
[27] LiangMM, WeiQL. A partial policy iteration ADP algorithm for nonlinear neuro‐optimal control with discounted total reward. Neurocomputing. 2021;424:23‐34.
[28] WeiQ, ZhuL, LiT, DerongL. A new approach to finite‐horizon optimal control of discrete‐time affine nonlinear systems via a pseudo‐linear method. IEEE Trans Automat Contr. 2021;67(5):2610‐2617. · Zbl 1537.93421
[29] SongR, LewisFL, WeiQ, ZhangH. Off‐policy actor‐critic structure for optimal control of unknown systems with disturbances. IEEE Trans Cybern. 2016;46(5):1041‐1050.
[30] ModaresH, LewisFL, Naghibi‐SistaniMB. Integral reinforcement learning and experience replay for adaptive optimal control of partially‐unknown constrained‐input continuous‐time systems. Automatica. 2014;50(1):193‐202. · Zbl 1298.49042
[31] YangY, VamvoudakisKG, ModaresH, YinY, WunschDC. Hamiltonian‐driven hybrid adaptive dynamic programming. IEEE Trans Syst Man Cybern Syst. 2021;51(10):6423‐6434.
[32] YangY, ModaresH, VamvoudakisKG, HeW, XuCZ, WunschDC. Hamiltonian‐driven adaptive dynamic programming with approximation errors. IEEE Trans Cybern. 2021;1‐12. doi: 10.1109/TCYB.2021.3108034
[33] YangY, GaoW, ModaresH, XuCZ. Robust actor‐critic learning for continuous‐time nonlinear systems with unmodeled dynamics. IEEE Trans Fuzzy Syst. 2021;30:(6);2101‐2112.
[34] KamalapurkarR, AndrewsL, WaltersP, DixonWE. Model‐based reinforcement learning for infinite‐horizon approximate optimal tracking. IEEE Trans Neural Netw Learn Syst. 2017;28(3):753‐758.
[35] KamalapurkarR, AndrewsL, WaltersP, DixonWE. Model‐based reinforcement learning for infinite‐horizon approximate optimal tracking. Proceedings of the 53rd IEEE Conference on Decision and Control; 2014:5083‐5088.
[36] KamalapurkarR, DinhH, BhasinS, DixonWE. Approximate optimal trajectory tracking for continuous‐time nonlinear systems. Automatica. 2015;51:40‐48. · Zbl 1309.93086
[37] ZhaoJ, VishalP. Neural network‐based optimal tracking control for partially unknown discrete‐time non‐linear systems using reinforcement learning. IET Control Theory Appl. 2021;15(2):260‐271.
[38] ZhaoJ. Neural network‐based optimal tracking control of continuous‐time uncertain nonlinear system via reinforcement learning. Neural Process Lett. 2020;51(3):2513‐2530.
[39] KamalapurkarR, WaltersP, RosenfeldJ, DixonW. Reinforcement Learning for Optimal Feedback Control: A Lyapunov‐Based Approach. Springer; 2018. · Zbl 1403.49001
[40] VamvoudakisKG, LewisFL. Multi‐player non‐zero‐sum games: online adaptive learning solution of coupled Hamilton‐Jacobi equations. Automatica. 2011;47(8):1556‐1569. · Zbl 1237.91015
[41] ZhangH, CuiL, LuoY. Near‐optimal control for nonzero‐sum differential games of continuous‐time nonlinear systems using single‐network ADP. IEEE Trans Cybern. 2013;43(1):206‐216.
[42] CuiXH, ZhangHG, LuoYH, ZuPF. Online finite‐horizon optimal learning algorithm for nonzero‐sum games with partially unknown dynamics and constrained inputs. Neurocomputing. 2016;185:37‐44.
[43] RenH, ZhangH, WenY, LiuC. Integral reinforcement learning off‐policy method for solving nonlinear multi‐player nonzero‐sum games with saturated actuator. Neurocomputing. 2019;335:96‐104.
[44] ZhangQ, ZhaoD. Data‐based reinforcement learning for nonzero‐sum games with unknown drift dynamics. IEEE Trans Cybern. 2019;49(8):2874‐2885.
[45] ZhangH, JiangH, LuoC, XiaoG. Discrete‐time nonzero‐sum games for multiplayer using policy‐iteration‐based adaptive dynamic programming algorithms. IEEE Trans Cybern. 2017;47(10):3331‐3340.
[46] JiangH, ZhangHG, ZhangK, CuiXH. Data‐driven adaptive dynamic programming schemes for non‐zero‐sum games of unknown discrete‐time nonlinear systems. Neurocomputing. 2018;275:649‐658.
[47] WenYL, ZhangHG, RenH, ZhangK. Off‐policy based adaptive dynamic programming method for nonzero‐sum games on discrete‐time system. J Franklin Inst. 2020;357:8059‐8081. · Zbl 1447.93205
[48] JiangH, ZhangHG, XiaoGY, CuiXH. Data‐based approximate optimal control for nonzero‐sum games of multi‐player systems using adaptive dynamic programming. Neurocomputing. 2018;275:192‐199.
[49] LvYF, RenXM, NaJH. Online optimal solutions for multi‐player nonzero‐sum game with completely unknown dynamics. Neurocomputing. 2018;283:87‐97.
[50] SongR, LewisFL, WeiQ. Off‐policy integral reinforcement learning method to solve nonlinear continuous‐time multiplayer nonzero‐sum games. IEEE Trans Neural Netw Learn Syst. 2017;28(3):704‐713.
[51] WenY, ZhangH, SuH, RenH. Optimal tracking control for non‐zero‐sum games of linear discrete‐time systems via off‐policy reinforcement learning. Opt Control Appl Methods. 2020;41(4):1‐18.
[52] LiJN, XiaoZF, LiP, CaoJT. Robust optimal tracking control for multiplayer systems by off‐policy Q‐learning approach. Int J Robust Nonlinear Control. 2021;31:87‐106. · Zbl 1525.93053
[53] OdekunleA, GaoWN, DavariM, JiangZP. Reinforcement learning and non‐zero‐sum game output regulation for multi‐player linear uncertain systems. Automatica. 2020;112:108672. · Zbl 1430.93119
[54] LvYF, RexXM, NaJ. Adaptive optimal tracking controls of unknown multi‐input systems based on nonzero‐sum game theory. J Franklin Inst. 2019;356:8255‐8277. · Zbl 1423.93176
[55] ZhaoJG. Neural networks‐based optimal tracking control for nonzero‐sum games of multi‐player continuous‐time nonlinear systems via reinforcement learning. Neurocomputing. 2020;412:167‐176.
[56] ModaresH, LewisFL. Optimal tracking control of nonlinear partially‐unknown constrained‐input systems using integral reinforcement learning. Automatica. 2014;50(7):1780‐1792. · Zbl 1296.93073
[57] ModaresH, LewisFL. Linear quadratic tracking control of partially‐unknown continuous‐time systems using reinforcement learning. IEEE Trans Automat Contr. 2014;59(11):3051‐3056. · Zbl 1360.93726
[58] BaşarT, OlsderGJ. Dynamic Noncooperative Game Theory. Society for Industrial and Applied Mathematics. 1999. · Zbl 0946.91001
[59] VrabieD, PastravanuO, Abu‐KhalafM, LewisFL. Adaptive optimal control for continuous‐time linear systems based on policy iteration. Automatica. 2009;45(2):477‐484. · Zbl 1158.93354
[60] Abu‐KhalafM, LewisFL. Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach. Automatica. 2005;41(5):779‐791. · Zbl 1087.49022
[61] ModaresH, LewisFL, JiangZP. H∞ tracking control of completely unknown continuous‐time systems via off‐policy reinforcement learning. IEEE Trans Neural Netw Learn Syst. 2015;26(10):2550‐2562.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.