×

Learning-based model predictive control under value iteration with finite approximation errors. (English) Zbl 1533.93180

Summary: This paper proposes a novel learning-based model predictive control (LMPC) scheme for discrete-time nonlinear systems. It overcomes the challenge of manually designing the terminal conditions for traditional MPC and enhances the control performance. The scheme employs the value iteration (VI) in reinforcement learning (RL), and autonomously designs the terminal cost by iteratively performing value function learning and policy update under known dynamics and constraints. In contrast to the existing schemes that combine RL with MPC, the proposed scheme explicitly considers the approximation errors in each iteration. Further, a rigorous theoretical analysis is provided, including the convergence of VI, the stability and performance of the closed-loop system. In addition, the influences of the prediction horizon and the initial terminal cost on performance are also investigated. Simulation results of a linear system verify the theoretical properties of the LMPC and show that it achieves (near-)optimal performance. Moreover, its unique superiority over traditional MPC is fully demonstrated in a nonholonomic vehicle regulation example.
© 2023 John Wiley & Sons Ltd.

MSC:

93B45 Model predictive control
93C55 Discrete-time control/observation systems
93C10 Nonlinear systems in control theory
68T05 Learning and adaptive systems in artificial intelligence
Full Text: DOI

References:

[1] RakovićSV, LevineWS. Handbook of Model Predictive Control. Springer; 2018.
[2] MayneDQ, KerriganEC, Van WykE, FalugiP. Tube‐based robust nonlinear model predictive control. Int J Robust Nonlinear Control. 2011;21(11):1341‐1353. · Zbl 1244.93081
[3] MayneDQ, FalugiP. Stabilizing conditions for model predictive control. Int J Robust Nonlinear Control. 2019;29(4):894‐903. · Zbl 1458.93066
[4] WanR, LiS, ZhengY. Model predictive control for nonlinear systems with time‐varying dynamics and guaranteed Lyapunov stability. Int J Robust Nonlinear Control. 2021;31(2):509‐523. · Zbl 1525.93090
[5] BocciaA, GrüneL, WorthmannK. Stability and feasibility of state constrained MPC without stabilizing terminal constraints. Syst Control Lett. 2014;72:14‐21. · Zbl 1297.93068
[6] MayneDQ, RawlingsJB, RaoCV, ScokaertPO. Constrained model predictive control: stability and optimality. Automatica. 2000;36(6):789‐814. · Zbl 0949.93003
[7] ChenH, AllgöwerF. A quasi‐infinite horizon nonlinear model predictive control scheme with guaranteed stability. Automatica. 1998;34(10):1205‐1217. · Zbl 0947.93013
[8] YangY, VamvoudakisKG, ModaresH. Safe reinforcement learning for dynamical games. Int J Robust Nonlinear Control. 2020;30(9):3706‐3726. · Zbl 1466.91038
[9] GuoL, RizviSAA, LinZ. Optimal control of a two‐wheeled self‐balancing robot by reinforcement learning. Int J Robust Nonlinear Control. 2021;31(6):1885‐1904. · Zbl 1526.93175
[10] XuJ, WangJ, RaoJ, ZhongY, WangH. Adaptive dynamic programming for optimal control of discrete‐time nonlinear system with state constraints based on control barrier function. Int J Robust Nonlinear Control. 2022;32(6):3408‐3424. · Zbl 1527.93233
[11] GuoZ, YaoD, BaiW, LiH, LuR. Event‐triggered guaranteed cost fault‐tolerant optimal tracking control for uncertain nonlinear system via adaptive dynamic programming. Int J Robust Nonlinear Control. 2021;31(7):2572‐2592. · Zbl 1526.93151
[12] WeiQ, WangFY, LiuD, YangX. Finite‐approximation‐error‐based discrete‐time iterative adaptive dynamic programming. IEEE Trans Cybern. 2014;44(12):2820‐2833.
[13] LowreyK, RajeswaranA, KakadeS, TodorovE, MordatchI. Plan online, learn offline: efficient learning and exploration via model‐based control. arXiv preprint arXiv:1811.01848. 2018.
[14] FarshidianF, HoellerD, HutterM. Deep value model predictive control. arXiv preprint arXiv:1910.03358. 2019.
[15] BhardwajM, HandaA, FoxD, BootsB. Information Theoretic Model Predictive Q‐Learning. PMLR; 2020:840‐850.
[16] KarnchanachariN, VallsMI, HoellerD, HutterM. Practical reinforcement learning for MPC: learning from sparse objectives in under an hour on a real robot. Learning for Dynamics and Control. PMLR; 2020:211‐224.
[17] GrosS, ZanonM. Data‐driven economic NMPC using reinforcement learning. IEEE Trans Automat Contr. 2019;65(2):636‐648. · Zbl 1533.93176
[18] ZanonM, GrosS. Safe reinforcement learning using robust MPC. IEEE Trans Automat Contr. 2020;66(8):3638‐3652. · Zbl 1471.93093
[19] SawantS, AnandAS, ReinhardtD, GrosS. Learning‐based MPC from big data using reinforcement learning. arXiv preprint arXiv:2301.01667. 2023.
[20] KordabadAB, GrosS. Q‐learning of the storage function in economic nonlinear model predictive control. Eng Appl Artif Intel. 2022;116:105343.
[21] GrosS, ZanonM. Economic MPC of Markov decision processes: dissipativity in undiscounted infinite‐horizon optimal control. Automatica. 2022;146:110602. · Zbl 1504.93086
[22] EsfahaniHN, GrosS. Policy gradient reinforcement learning for uncertain polytopic LPV systems based on MHE‐MPC. IFAC‐PapersOnLine. 2022;55(15):1‐6.
[23] EsfahaniHN, KordabadAB, GrosS. Approximate robust NMPC using reinforcement learning. Proceedings of European Control Conference. 2021 132-137.
[24] EsfahaniHN, KordabadAB, GrosS. Reinforcement learning based on MPC/MHE for unmodeled and partially observable dynamics. Proceedings of American Control Conference. 2021 2121-2126.
[25] KordabadAB, ZanonM, GrosS. Equivalence of optimality criteria for Markov decision process and model predictive control. IEEE Trans Automat Contr. 2023. doi:10.1109/TAC.2023.3277309 · Zbl 07884272
[26] LinM, SunZ, XiaY, ZhangJ. Reinforcement learning‐based model predictive control for discrete‐time systems. IEEE Trans Neural Netw Learn Syst. 2023. doi:10.1109/TNNLS.2023.3273590
[27] KirkDE. Optimal Control Theory: an Introduction. Dover; 2004.
[28] LimónD, AlamoT, SalasF, CamachoEF. On the stability of constrained MPC without terminal constraint. IEEE Trans Automat Contr. 2006;51(5):832‐836. · Zbl 1366.93483
[29] GrüneL. NMPC without terminal constraints. IFAC Proc Vol. 2012;45(17):1‐13.
[30] GrüneL, RantzerA. On the infinite horizon performance of receding horizon controllers. IEEE Trans Automat Contr. 2008;53(9):2100‐2111. · Zbl 1367.90109
[31] RawlingsJB, MayneDQ. Model Predictive Control: Theory and Design. Nob Hill Pub; 2009.
[32] HuB, LinnemannA. Toward infinite‐horizon optimality in nonlinear model predictive control. IEEE Trans Automat Contr. 2002;47(4):679‐682. · Zbl 1364.93339
[33] LedererA, UmlauftJ, HircheS. Uniform error and posterior variance bounds for Gaussian process regression with application to safe control. arXiv preprint arXiv:2101.05328. 2021.
[34] LedererA, UmlauftJ, HircheS. Uniform error bounds for Gaussian process regression with application to safe control. arXiv preprint arXiv:1906.01376. 2019.
[35] WilliamsCK, RasmussenCE. Gaussian Processes for Machine Learning. Vol 2. MIT Press; 2006. · Zbl 1177.68165
[36] PensoneaultA, YangX, ZhuX. Nonnegativity‐enforced Gaussian process regression. Theor Appl Mech Lett. 2020;10(3):182‐187.
[37] DeisenrothMP, FaisalAA, OngCS. Mathematics for Machine Learning. Cambridge University Press; 2020. · Zbl 1491.68002
[38] WorthmannK, MehrezMW, ZanonM, MannGK, GosineRG, DiehlM. Model predictive control of nonholonomic mobile robots without stabilizing constraints and costs. IEEE Trans Control Syst Technol. 2015;24(4):1394‐1406.
[39] MehrezMW, WorthmannK, CeneriniJP, OsmanM, MelekWW, JeonS. Model predictive control without terminal constraints or costs for holonomic mobile robots. Robot Autonom Syst. 2020;127:103468.
[40] BrockettRW. Asymptotic stability and feedback stabilization. Differ Geometr Control Theory. 1983;27(1):181‐191. · Zbl 0528.93051
[41] ZhuY, OzgunerU. Robustness analysis on constrained model predictive control for nonholonomic vehicle regulation. Proceedings of American Control Conference. 2009 3896-3901.
[42] KerriganEC, MaciejowskiJM. Invariant sets for constrained nonlinear discrete‐time systems with application to feasibility in model predictive control. Proceedings of IEEE Conference on Decision and Control. IEEE. 2000 4951-4956.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.