Document Zbl 1531.93210

Jha, Mayank Shekhar; Theilliol, Didier; Weber, Philippe

Model-free optimal tracking over finite horizon using adaptive dynamic programming. (English) Zbl 1531.93210

Optim. Control Appl. Methods 44, No. 6, 3114-3138 (2023).

Summary: Adaptive dynamic programming (ADP) based approaches are effective for solving nonlinear Hamilton-Jacobi-Bellman (HJB) in an approximative sense. This paper develops a novel ADP-based approach, in that the focus is on minimizing the consecutive changes in control inputs over a finite horizon to solve the optimal tracking problem for completely unknown discrete time systems. To that end, the cost function considers within its arguments: tracking performance, energy consumption and as a novelty, consecutive changes in the control inputs. Through suitable system transformation, the optimal tracking problem is transformed to a regulation problem with respect to state tracking error. The latter leads to a novel performance index function over finite horizon and corresponding nonlinear HJB equation that is solved in an approximative iterative sense using a novel iterative ADP-based algorithm. A suitable neural network-based structure is proposed to learn the initial admissible one step zero control law. The proposed iterative ADP is implemented using heuristic dynamic programming technique based on actor-critic neural network structure. Finally, simulation studies are presented to illustrate the effectiveness of the proposed algorithm.
© 2023 John Wiley & Sons Ltd.

Cited in 1 Document

MSC:

93C40	Adaptive control/observation systems
49L20	Dynamic programming in optimal control and differential games
49L12	Hamilton-Jacobi equations in optimal control and differential games

Keywords:

actor critic; adaptive dynamic programming; model free; neural networks; nonlinear Hamilton Jacobi Bellman; optimal tracking

Cite Review PDF

Full Text: DOI

References:

[1]	LewisFL, VrabieD, SyrmosVL. Optimal control. John Wiley & Sons; 2012.
[2]	LiN, KolmanovskyI, GirardA. LQ control of unknown discrete‐time linear systems—A novel approach and a comparison study. Optim Control Appl Methods.2019;40(2):265‐291. · Zbl 1429.93215
[3]	BellmannR. Dynamic programming. Princeton University Press; 1957. · Zbl 0077.13605
[4]	BertsekasDP, BertsekasDP, BertsekasDP, BertsekasDP. Dynamic programming and optimal control. Vol 1. Athena scientific Belmont; 1995. · Zbl 0904.90170
[5]	SuH, ZhangH, ZhangK, GaoW. Online reinforcement learning for a class of partially unknown continuous‐time nonlinear systems via value iteration. Optim Control Appl Methods. 2018;39(2):1011‐1028. · Zbl 1391.93133
[6]	WerbosP. Handbook of Intelligent Control. Van Nostrand Reinhold New York Publishers; 1992:493‐525.
[7]	WerbosPJ. Intelligence in the brain: A theory of how it works and how to build it. Neural Netw. 2009;22(3):200‐212.
[8]	SokolovY, KozmaR, WerbosLD, WerbosPJ. Complete stability analysis of a heuristic approximate dynamic programming control design. Automatica. 2015;59:9‐18. · Zbl 1338.90442
[9]	ProkhorovDV, SantiagoRA, WunschDC. Adaptive critic designs: A case study for neurocontrol. Neural Networks [Internet]. 1995;8(9):1367‐1372. http://www.sciencedirect.com/science/article/pii/0893608095000429
[10]	ProkhorovDV, WunschDC. Adaptive critic designs. IEEE Trans Neural Netw. 1997;8(5):997‐1007.
[11]	LangeronY, GrallA, BarrosA. A modeling framework for deteriorating control system and predictive maintenance of actuators. Reliab Eng Syst Saf [Internet]. 2015;140:22‐36. http://www.sciencedirect.com/science/article/pii/S0951832015000940
[12]	KhelassiA, TheilliolD, WeberP, PonsartJC. Fault‐tolerant control design with respect to actuator health degradation: An LMI approach. 2011 IEEE International Conference on Control Applications (CCA), IEEE; 2011.
[13]	ZhangC, XuX, ZhangX. Dual heuristic programming with just‐in‐time modeling for self‐learning fault‐tolerant control of mobile robots. Optim Control Appl Methods. 2021;44:1215‐1234. · Zbl 1531.93073
[14]	ChenZ, ChenS, ZhangY, DengQ, ZengX. Online and hard constrained adaptive dynamic programming algorithm for energy storage control in smart buildings. Optim Control Appl Methods. 2021;44:1074‐1091. · Zbl 1531.93201
[15]	LiuY, XingZ, ChenZ, XuJ. Data‐based robust optimal control of discrete‐time systems with uncertainties via adaptive dynamic programming. Optim Control Appl Methods. 2021;44:1290‐1304. · Zbl 1531.93069
[16]	LiangY, ZhangH, ZhangK, WangR. A novel neural network discrete‐time optimal control design for nonlinear time‐delay systems using adaptive critic designs. Optim Control Appl Methods.2020;41(3):748‐764. · Zbl 1467.93203
[17]	AliSF, PadhiR. Optimal blood glucose regulation of diabetic patients using single network adaptive critics. Optim Control Appl Methods.2011;32(2):196‐214. · Zbl 1215.92024
[18]	GuJ, ZhouJ. An off‐policy approach for model‐free stabilization of linear systems subject to input energy constraint and its application to spacecraft rendezvous. Optim Control Appl Methods.2020;41(3):948‐959. · Zbl 1467.93252
[19]	Al‐TamimiA, LewisFL, Abu‐KhalafM. Discrete‐time nonlinear HJB solution using approximate dynamic programming: Convergence proof. IEEE Trans Syst Man, Cybern Part B.2008;38(4):943‐949.
[20]	ZhangH, WeiQ, LuoY. A novel infinite‐time optimal tracking control scheme for a class of discrete‐time nonlinear systems via the greedy HDP iteration algorithm. IEEE Trans Syst Man, Cybern Part B. 2008;38(4):937‐942.
[21]	DierksT, ThumatiBT, JagannathanS. Optimal control of unknown affine nonlinear discrete‐time systems using offline‐trained neural networks with proof of convergence. Neural Netw. 2009;22(5-6):851‐860. · Zbl 1338.49074
[22]	WangD, LiuD, WeiQ, ZhaoD, JinN. Optimal control of unknown nonaffine nonlinear discrete‐time systems based on adaptive dynamic programming. Automatica [Internet]. 2012;48(8):1825‐1832. http://www.sciencedirect.com/science/article/pii/S0005109812002221 · Zbl 1269.49042
[23]	LiuF, SunJ, SiJ, GuoW, MeiS. A boundedness result for the direct heuristic dynamic programming. Neural Netw. 2012;32:229‐235. · Zbl 1254.90286
[24]	AmatoF, AriolaM. Finite‐time control of discrete‐time linear systems. IEEE Trans Automat Contr.2005;50(5):724‐729. · Zbl 1365.93182
[25]	HaimoVT. Finite time controllers. SIAM J Control Optim. 1986;24(4):760‐770. · Zbl 0603.93005
[26]	ZattoniE. Structural invariant subspaces of singular Hamiltonian systems and nonrecursive solutions of finite‐horizon optimal control problems. IEEE Trans Automat Contr. 2008;53(5):1279‐1284. · Zbl 1367.93146
[27]	WangFY, JinN, LiuD, WeiQ. Adaptive dynamic programming for finite‐horizon optimal control of discrete‐time nonlinear systems with ε‐error bound. IEEE Trans Neural Netw. 2010;22(1):24‐36.
[28]	WangD, LiuD, WeiQ. Finite‐horizon neuro‐optimal tracking control for a class of discrete‐time nonlinear systems using adaptive dynamic programming approach. Neurocomputing. 2012;78(1):14‐22.
[29]	HeydariA, BalakrishnanSN. Finite‐horizon control‐constrained nonlinear optimal control using single network adaptive critics. IEEE Trans Neural Networks Learn Syst. 2012;24(1):145‐157.
[30]	HuangY, LiuD. Neural‐network‐based optimal tracking control scheme for a class of unknown discrete‐time nonlinear systems using iterative ADP algorithm. Neurocomputing [Internet]. 2014;125:46‐56. http://www.sciencedirect.com/science/article/pii/S0925231213001641
[31]	MuC, WangD, HeH. Data‐driven finite‐horizon approximate optimal control for discrete‐time nonlinear systems using iterative HDP approach. IEEE Trans Cybern. 2018;48(10):2948‐2961.
[32]	SongR, XieY, ZhangZ. Data‐driven finite‐horizon optimal tracking control scheme for completely unknown discrete‐time nonlinear systems. Neurocomputing.2019;356:206‐216.
[33]	ParkYM, ChoiMS, LeeKY. An optimal tracking neuro‐controller for nonlinear dynamic systems. IEEE Trans Neural Netw. 1996;7(5):1099‐1110.

This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.