×

Adaptive dynamic programming for optimal control of discrete-time nonlinear system with state constraints based on control barrier function. (English) Zbl 1527.93233

Summary: Adaptive dynamic programming (ADP) methods have demonstrated their efficiency. However, many of the applications for which ADP offers great potential, are also safety-critical and need to meet safety specifications in the presence of physical constraints. In this article, an optimal controller for solving discrete-time nonlinear systems with state constraints is proposed. By introducing the control barrier function into the utility function, the problem with state constraints is transformed into an unconstrained optimal control problem, addressing state constraints which are difficult to handle by traditional ADP methods. The constructed sequence of value function is shown to be monotonically non-increasing and converges to the optimal value. Besides, this article gives the stability proof of the developed algorithm, as well as the conditions for satisfying the state constraints. To implement and approximate the control barrier function based adaptive dynamic programming algorithm, an actor-critic network structure is built. During the training process, two neural networks are used for approximation separately. The performance of the proposed method is validated by testing it on a simulation example.
{© 2021 John Wiley & Sons Ltd.}

MSC:

93C40 Adaptive control/observation systems
49L20 Dynamic programming in optimal control and differential games
93C55 Discrete-time control/observation systems
93C10 Nonlinear systems in control theory
Full Text: DOI

References:

[1] LiuD, XueS, ZhaoB, LuoB, WeiQ. Adaptive dynamic programming for control: a survey and recent advances. IEEE Trans Syst Man Cybern Syst. 2020.
[2] CuiR, YangC, LiY, SharmaS. Adaptive neural network control of AUVs with control input nonlinearities using reinforcement learning. IEEE Trans Syst Man Cybern Syst. 2017;47(6):1019‐1029.
[3] LuJ, WeiQ, WangFY. Parallel control for optimal tracking via adaptive dynamic programming. IEEE/CAA J Automat Sin. 2020;7(6):1662‐1674.
[4] WeiQ, SongR, LiaoZ, LiB, LewisFL. Discrete‐time impulsive adaptive dynamic programming. IEEE Trans Cybern. 2019;50(10):4293‐4306.
[5] LuoB, YangY, LiuD. Adaptive \(Q\)‐learning for data‐based optimal output regulation with experience replay. IEEE Trans Cybern. 2018;48(12):3337‐3348.
[6] WangD, HaM, QiaoJ. Data‐driven iterative adaptive critic control toward an urban wastewater treatment plant. IEEE Trans Ind Electron. 2020;68(8):7362‐7369.
[7] WangD, QiaoJ, ChengL. An approximate neuro‐optimal solution of discounted guaranteed cost control design. IEEE Trans Cybern. 2020.
[8] XuJ, WangH, RaoJ, WangJ. Zone scheduling optimization of pumps in water distribution networks with deep reinforcement learning and knowledge‐assisted learning. Soft Comput. 2021;1‐11.
[9] WangX, DingD, GeX, HanQL. Neural‐network‐based control for discrete‐time nonlinear systems with denial‐of‐service attack: the adaptive event‐triggered case. Int J Robust Nonlinear Control. 2021.
[10] WangX, DingD, DongH, ZhangXM. Neural‐network‐based control for discrete‐time nonlinear systems with input saturation under stochastic communication protocol. IEEE/CAA J Automat Sin. 2021;8(4):766‐778.
[11] WabersichKP, HewingL, CarronA, ZeilingerMN. Probabilistic model predictive safety certification for learning‐based control. IEEE Trans Automat Contr. 2021.
[12] WabersichKP, ZeilingerMN. A predictive safety filter for learning‐based control of constrained nonlinear dynamical systems. Automatica. 2021;129:109597. · Zbl 1478.93138
[13] GrosS, ZanonM, BemporadA. Safe reinforcement learning via projection on a safe set: how to achieve optimality? 2020. arXiv preprint arXiv:2004.00915.
[14] ZanonM, GrosS. Safe reinforcement learning using robust MPC. IEEE Trans Automat Contr. 2020.
[15] YangY, VamvoudakisKG, ModaresH. Safe reinforcement learning for dynamical games. Int J Robust Nonlinear Control. 2020;30(9):3706‐3726. · Zbl 1466.91038
[16] GreeneML, DeptulaP, NivisonS, DixonWE. Sparse learning‐based approximate dynamic programming with barrier constraints. IEEE Control Syst Lett. 2020;4(3):743‐748.
[17] CohenMH, BeltaC. Approximate optimal control for safety‐critical systems with control barrier functions; 2020:2062‐2067.
[18] MarviZ, KiumarsiB. Safe reinforcement learning: a control barrier function optimization approach. Int J Robust Nonlinear Control. 2021;31(6):1923‐1940. · Zbl 1526.93181
[19] DeptulaP, ChenHY, LicitraRA, RosenfeldJA, DixonWE. Approximate optimal motion planning to avoid unknown moving avoidance regions. IEEE Trans Robot. 2019;36(2):414‐430.
[20] AmesAD, XuX, GrizzleJW, TabuadaP. Control barrier function based quadratic programs for safety critical systems. IEEE Trans Automat Contr. 2016;62(8):3861‐3876. · Zbl 1373.90092
[21] OhnishiM, WangL, NotomistaG, EgerstedtM. Barrier‐certified adaptive reinforcement learning with applications to brushbot navigation. IEEE Trans Robot. 2019;35(5):1186‐1205.
[22] SrinivasanM, CooganS, EgerstedtM. Control of multi‐agent systems with finite time control barrier certificates and temporal logic; 2018:1991‐1996; IEEE.
[23] AgrawalA, SreenathK. Discrete control barrier functions for safety‐critical control of discrete systems with application to bipedal robot navigation; 2017.
[24] WangL, TheodorouEA, EgerstedtM. Safe learning of quadrotor dynamics using barrier certificates; 2018:2460‐2465.
[25] ChengR, OroszG, MurrayRM, BurdickJW. End‐to‐end safe reinforcement learning through barrier functions for safety‐critical continuous control tasks. Proceedings of the AAAI Conference on Artificial Intelligence; Vol. 33, 2019:3387‐3395.
[26] WangL, HanD, EgerstedtM. Permissive barrier certificates for safe stabilization using sum‐of‐squares. Proceedings of the 2018 Annual American Control Conference (ACC); 2018:585‐590; IEEE.
[27] AmesAD, GrizzleJW, TabuadaP. Control barrier function based quadratic programs with application to adaptive cruise control; 2014:6271‐6278; IEEE.
[28] TeeKP, GeSS, TayEH. Barrier Lyapunov functions for the control of output‐constrained nonlinear systems. Automatica. 2009;45(4):918‐927. · Zbl 1162.93346
[29] BellmanR. On the theory of dynamic programming. Proc Natl Acad Sci. 1952;38(8):716. · Zbl 0047.13802
[30] BlanchiniF. Set invariance in control. Automatica. 1999;35(11):1747‐1767. · Zbl 0935.93005
[31] AbrahamR, MarsdenJE, RatiuT. Manifolds, Tensor Analysis, and Applications. Vol 75. Springer Science & Business Media; 2012.
[32] KhalilHK. Nonlinear Systems. 3rd ed.Patience Hall; 2002:115. · Zbl 1003.34002
[33] KingmaDP, BaJ. Adam: a method for stochastic optimization; 2014. arXiv preprint arXiv:1412.6980.
[34] ZhangY, ZhaoB, LiuD. Deterministic policy gradient adaptive dynamic programming for model‐free optimal control. Neurocomputing. 2020;387:40‐50.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.