×

A set-based model-free reinforcement learning design technique for nonlinear systems. (English) Zbl 1417.93138

Summary: In this study, we propose an extremum-seeking approach for the approximation of optimal control problems for a class of unknown nonlinear dynamical systems. The technique combines a phasor extremum-seeking controller with a reinforcement learning strategy. The learning approach is used to estimate the value function of an optimal control problem of interest. The phasor extremum-seeking controller implements the approximate optimal controller. The approach is shown to provide reasonable approximations of optimal control problems without the need for a parameterization of the nonlinear system’s dynamics. A simulation example is provided to demonstrate the effectiveness of the technique.

MSC:

93B52 Feedback control
93C10 Nonlinear systems in control theory
49N90 Applications of optimal control and differential games
Full Text: DOI

References:

[1] SuttonRS, BartoAG. Reinforcement Learning: An Introduction. Cambridge, MA: The MIT Press; 1998.
[2] WatkinsCJCH, DayanP. Q‐learning. Mach Learn. 1992;8(3‐4):279‐292. · Zbl 0773.68062
[3] BartoAG, SuttonRS, AndersonCW. Neuronlike adaptive that can solve difficult learning control problems. IEEE Trans Syst Man Cybern. 1983;SMC‐13(5):834‐846.
[4] BertsekasDP, TsitsiklisJN. Neuro‐dynamic programming: an overview. In: Proceedings of the 34th IEEE Conference on Decision and Control; 1995; New Orleans, LA.
[5] BusoniuL, BabuskaR, SchutterBD, ErnstD. Reinforcement Learning and Dynamic Programming Using Function Approximators. Boca Raton, FL: CRC press; 2010.
[6] MehtaP, MeynS. Q‐learning and Pontryagin’s minimum principle. In: Proceedings of the 48h IEEE Conference on Decision and Control (CDC) held jointly with 2009 28th Chinese Control Conference; 2009; Shanghai, China.
[7] SuttonRS, BartoAG, WilliamsRJ. Reinforcement learning is direct adaptive optimal control. IEEE Control Syst. 1992;12(2):19‐22.
[8] BradtkeSJ, YdstieBE, BartoAG. Adaptive linear quadratic control using policy iteration. Paper presented at: American Control Conference; 1994; Baltimore, MD.
[9] SuttonRS. Generalization in reinforcement learning: successful examples using sparse coarse coding. In: Advances in Neural Information Processing Systems. Cambridge, MA: The MIT Press; 1996:1038‐1044.
[10] LewisFL, VrabieD. Reinforcement learning and adaptive dynamic programming for feedback control. IEEE Circuits Syst Mag. 2009;9(3):32‐50.
[11] WerbosPJ. Approximate dynamic programming for real‐time control and neural modeling. In: Handbook of Intelligent Control: Neural, Fuzzy, and Adaptive Approaches. New York, NY: Van Nostrand Reinhold; 1992:493‐525.
[12] BhasinS, KamalapurkarR, JohnsonM, VamvoudakisKG, LewisFL, DixonWE. A novel actor-critic-identifier architecture for approximate optimal control of uncertain nonlinear systems. Automatica. 2013;49(1):82‐92. · Zbl 1257.93055
[13] VamvoudakisKG, LewisFL. Online actor‐critic algorithm to solve the continuous‐time infinite horizon optimal control problem. Automatica. 2010;46(5):878‐888. · Zbl 1191.49038
[14] KiumarsiB, VamvoudakisKG, ModaresH, LewisFL. Optimal and autonomous control using reinforcement learning: a survey. IEEE Trans Neural Netw Learn Syst. 2018;29(6):2042‐2062.
[15] KamalapurkarR, WaltersP, RosenfeldJ, DixonW. Model‐based reinforcement learning for approximate optimal control. In: Reinforcement Learning for Optimal Feedback Control: A Lyapunov‐Based Approach. Cham, Switzerland: Springer International Publishing AG; 2018:91‐148. · Zbl 1403.49001
[16] ChowdharyG, LiuM, GrandeR, WalshT, HowJ, CarinL. Off‐policy reinforcement learning with Gaussian processes. IEEE/CAA J Autom Sin. 2014;1(3):227‐238.
[17] BenosmanM. Model‐based vs data‐driven adaptive control: an overview. Int J Adapt Control Signal Process. 2018;32(5):753‐776. · Zbl 1396.93068
[18] GuayM, DochainD, PerrierM. Adaptive extremum seeking control of continuous stirred tank bioreactors with unknown growth kinetics. Automatica. 2004;40(5):881‐888. · Zbl 1050.93055
[19] HudonN, GuayM, PerrierM, DochainD. Adaptive extremum‐seeking control of convection‐reaction distributed reactor with limited actuation. Comput Chem Eng. 2008;32(12):2994‐3001.
[20] PovedaJI, VamvoudakisKG, BenosmanM. A neuro‐adaptive architecture for extremum seeking control using hybrid learning dynamics. Paper presented at: American Control Conference (ACC); 2017; Seattle, WA.
[21] AdetolaV, GuayM, LehrerD. Adaptive estimation for a class of nonlinearly parameterized dynamical systems. IEEE Trans Autom Control. 2014;59(10):2818‐2824. · Zbl 1360.93662
[22] AttaKT, JohanssonA, GustafssonT. Extremum seeking control based on phasor estimation. Syst Control Lett. 2015;85:37‐45. · Zbl 1322.93102
[23] TanY, NešićD, MareelsI. On non‐local stability properties of extremum seeking control. Automatica. 2006;42(6):889‐903. · Zbl 1117.93362
[24] KreisselmeierG. Adaptive observers with exponential rate of convergence. IEEE Trans Autom Control. 1977;22(1):2‐8. · Zbl 0346.93043
[25] KhalilHK. Nonlinear System. New York, NY: MacMillan Publishing Company; 1992. · Zbl 0969.34001
[26] TeelAR, PeutemanJ, AeyelsD. Semi‐global practical asymptotic stability and averaging. Syst Control Lett. 1999;37(5):329‐334. · Zbl 0948.93049
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.