×

Stabilizing reinforcement learning control: a modular framework for optimizing over all stable behavior. (English) Zbl 1537.93689

Summary: We propose a framework for the design of feedback controllers that combines the optimization-driven and model-free advantages of deep reinforcement learning with the stability guarantees provided by using the Youla-Kučera parameterization to define the search domain. Recent advances in behavioral systems allow us to construct a data-driven internal model; this enables an alternative realization of the Youla-Kučera parameterization based entirely on input-output exploration data. Perhaps of independent interest, we formulate and analyze the stability of such data-driven models in the presence of noise. The Youla-Kučera approach requires a stable “parameter” for controller design. For the training of reinforcement learning agents, the set of all stable linear operators is given explicitly through a matrix factorization approach. Moreover, a nonlinear extension is given using a neural network to express a parameterized set of stable operators, which enables seamless integration with standard deep learning libraries. Finally, we show how these ideas can also be applied to tune fixed-structure controllers.

MSC:

93D99 Stability of control systems
93B52 Feedback control
68T05 Learning and adaptive systems in artificial intelligence

References:

[1] Anderson, B. D., From Youla-Kucera to identification, adaptive and nonlinear control, Automatica, 34, 12, 1485-1506, 1998 · Zbl 0935.93004
[2] Berberich, J.; Allgower, F., A trajectory-based framework for data-driven system analysis and control, (2020 European control conference, 2020, IEEE: IEEE Saint Petersburg, Russia), 1365-1370
[3] Berkenkamp, F.; Turchetta, M.; Schoellig, A.; Krause, A., Safe model-based reinforcement learning with stability guarantees, (Advances in neural information processing systems, Vol. 30, 2017, Curran Associates, Inc.), 1-11
[4] Buşoniu, L.; de Bruin, T.; Tolić, D.; Kober, J.; Palunko, I., Reinforcement learning for control: Performance, stability, and deep approximators, Annual Reviews in Control, 46, 8-28, 2018
[5] Chang, Y.-C.; Gao, S., Stabilizing neural control using self-learned almost Lyapunov critics, (2021 IEEE international conference on robotics and automation, 2021, IEEE), 1803-1809
[6] Coulson, J.; Van Waarde, H. J.; Lygeros, J.; Dörfler, F., A quantitative notion of persistency of excitation and the robust fundamental lemma, IEEE Control Systems Letters, 7, 1243-1248, 2022
[7] Friedrich, S. R.; Buss, M., A robust stability approach to robot reinforcement learning based on a parameterization of stabilizing controllers, (2017 IEEE international conference on robotics and automation, 2017, IEEE: IEEE Singapore, Singapore), 3365-3372
[8] Fujimoto, S.; van Hoof, H.; Meger, D., Addressing function approximation error in actor-critic methods, (Proceedings of the 35th international conference on machine learning. Proceedings of the 35th international conference on machine learning, Proceedings of machine learning research, vol. 80, 2018, PMLR), 1587-1596
[9] Furieri, L.; Zheng, Y.; Papachristodoulou, A.; Kamgarpour, M., An input-output parametrization of stabilizing controllers: Amidst youla and system level synthesis, IEEE Control Systems Letters, 3, 4, 1014-1019, 2019
[10] Gillis, N.; Karow, M.; Sharma, P., Approximating the nearest stable discrete-time system, Linear Algebra and its Applications, 573, 37-53, 2019 · Zbl 1415.65086
[11] Gros, S.; Zanon, M., Learning for MPC with stability & safety guarantees, Automatica, 146, Article 110598 pp., 2022 · Zbl 1504.93085
[12] Han, M.; Zhang, L.; Wang, J.; Pan, W., Actor-critic reinforcement learning for control with stability guarantee, IEEE Robotics and Automation Letters, 5, 4, 6217-6224, 2020
[13] Jin, M.; Lavaei, J., Stability-certified reinforcement learning: A control-theoretic perspective, IEEE Access : Practical Innovations, Open Solutions, 8, 229086-229100, 2020
[14] Kim, Y.; Lee, J. M., Model-based reinforcement learning for nonlinear optimal control with practical asymptotic stability guarantees, AIChE Journal, 66, 10, 2020
[15] Kretchmar, R. M.; Young, P. M.; Anderson, C. W.; Hittle, D. C.; Anderson, M. L.; Delnero, C. C., Robust reinforcement learning control with static and dynamic stability, International Journal of Robust and Nonlinear Control, 11, 15, 1469-1500, 2001 · Zbl 0994.93054
[16] Lale, S.; Azizzadenesheli, K.; Hassibi, B.; Anandkumar, A., Reinforcement learning with fast stabilization in linear dynamical systems, (Proceedings of the 25th international conference on artificial intelligence and statistics, 2022, PMLR), 5354-5390
[17] Lawrence, N. P.; Forbes, M. G.; Loewen, P. D.; McClement, D. G.; Backström, J. U.; Gopaluni, R. B., Deep reinforcement learning with shallow controllers: An experimental application to PID tuning, Control Engineering Practice, 121, Article 105046 pp., 2022
[18] Lawrence, N. P.; Loewen, P. D.; Forbes, M. G.; Backström, J. U.; Gopaluni, R. B., Almost surely stable deep dynamics, (Advances in neural information processing systems, Vol. 33, 2020, Curran Associates, Inc.), 18942-18953
[19] Lawrence, N. P.; Loewen, P. D.; Wang, S.; Forbes, M. G.; Gopaluni, R. B., A modular framework for stabilizing deep reinforcement learning control, IFAC-PapersOnLine, 56, 2, 8006-8011, 2023, 22nd IFAC World Congress
[20] Markovsky, I.; Dörfler, F., Behavioral systems theory in data-driven analysis, signal processing, and control, Annual Reviews in Control, Article S1367578821000754 pp., 2021
[21] Modares, H.; Lewis, F. L.; Naghibi-Sistani, M.-B., Integral reinforcement learning and experience replay for adaptive optimal control of partially-unknown constrained-input continuous-time systems, Automatica, 50, 1, 193-202, 2014 · Zbl 1298.49042
[22] Mukherjee, S.; Vu, T. L., Reinforcement learning of structured stabilizing control for linear systems with unknown state matrix, IEEE Transactions on Automatic Control, 1, 2022
[23] Nian, R.; Liu, J.; Huang, B., A review on reinforcement learning: Introduction and applications in industrial process control, Computers & Chemical Engineering, 139, Article 106886 pp., 2020
[24] Perdomo, J.; Umenberger, J.; Simchowitz, M., Stabilizing dynamical systems via policy gradient methods, (Advances in neural information processing systems, Vol. 34, 2021, Curran Associates, Inc.), 29274-29286
[25] Revay, M.; Wang, R.; Manchester, I. R., Recurrent equilibrium networks: Flexible dynamic models with guaranteed stability and robustness, 2023, arXiv:2104.05942
[26] Roberts, J. W.; Manchester, I. R.; Tedrake, R., Feedback controller parameterizations for reinforcement learning, (2011 IEEE symposium on adaptive dynamic programming and reinforcement learning, 2011, IEEE: IEEE Paris), 310-317
[27] Rudelson, M.; Vershynin, R., Hanson-Wright inequality and sub-Gaussian concentration, Electronic Communications in Probability, 18, none, 2013 · Zbl 1329.60056
[28] Silver, D.; Lever, G.; Heess, N.; Degris, T.; Wierstra, D.; Riedmiller, M., Deterministic policy gradient algorithms, (International conference on machine learning, Vol. 32, 2014, PMLR, PMLR), 387-395
[29] Sontag, E. D., Smooth stabilization implies coprime factorization, IEEE Transactions on Automatic Control, 34, 4, 435-443, 1989 · Zbl 0682.93045
[30] van Waarde, H. J.; De Persis, C.; Camlibel, M. K.; Tesi, P., Willems’ fundamental lemma for state-space systems and its extension to multiple datasets, IEEE Control Systems Letters, 4, 3, 602-607, 2020
[31] Wang, R.; Barbara, N. H.; Revay, M.; Manchester, I. R., Learning over all stabilizing nonlinear controllers for a partially-observed linear system, IEEE Control Systems Letters, 7, 91-96, 2022
[32] Willems, J. C.; Rapisarda, P.; Markovsky, I.; De Moor, B. L., A note on persistency of excitation, Systems & Control Letters, 54, 4, 325-329, 2005 · Zbl 1129.93362
[33] Zhang, H.; Cui, L.; Zhang, X.; Luo, Y., Data-driven robust approximate optimal tracking control for unknown general nonlinear systems using adaptive dynamic programming method, IEEE Transactions on Neural Networks, 22, 12, 2226-2236, 2011
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.