×

Safe exploration in model-based reinforcement learning using control barrier functions. (English) Zbl 1505.93123

Summary: This paper develops a model-based reinforcement learning (MBRL) framework for learning online the value function of an infinite-horizon optimal control problem while obeying safety constraints expressed as control barrier functions (CBFs). Our approach is facilitated by the development of a novel class of CBFs, termed Lyapunov-like CBFs (LCBFs), that retain the beneficial properties of CBFs for developing minimally-invasive safe control policies while also possessing desirable Lyapunov-like qualities such as positive semi-definiteness. We show how these LCBFs can be used to augment a learning-based control policy to guarantee safety and then leverage this approach to develop a safe exploration framework in a MBRL setting. We demonstrate that our approach can handle more general safety constraints than comparative methods via numerical examples.

MSC:

93C40 Adaptive control/observation systems
68T05 Learning and adaptive systems in artificial intelligence
93C10 Nonlinear systems in control theory
49L20 Dynamic programming in optimal control and differential games

References:

[1] Ames, A. D., Coogan, S., Egerstedt, M., Notomista, G., Sreenath, K., & Tabuada, P. (2019). Control barrier functions: Theory and applications. In Proc. Eur. control conf. (pp. 3420-3431).
[2] Ames, A. D.; Xu, X.; Grizzle, J. W.; Tabuada, P., Control barrier function based quadratic programs for safety critical systems, IEEE Transactions on Automatic Control, 62, 8, 3861-3876 (2017) · Zbl 1373.90092
[3] Blanchini, F., Set invariance in control, Automatica, 35, 11, 1747-1767 (1999) · Zbl 0935.93005
[4] Cheng, R.; Orosz, G.; Murray, R. M.; Burdick, J. W., End-to-end safe reinforcement learning through barrier functions for safety-critical continuous control tasks, Proc. conf. on artificial intel., 3387-3395 (2019)
[5] Chowdhary, G., Concurrent learning for convergence in adaptive control without persistency of excitation (2010), Georgia Institute of Technology: Georgia Institute of Technology Atlanta, GA, (Ph.D. thesis)
[6] Cohen, M. H., & Belta, C. (2020). Approximate optimal control for safety-critical systems with control barrier functions. In Proc. conf. decis. control (pp. 2062-2067).
[7] Deptula, P.; Bell, Z. I.; Doucette, E. A.; Curtis, J. W.; Dixon, W. E., Data-based reinforcement learning approximate optimal control for an uncertain nonlinear system with control effectiveness faults, Automatica, 116, 1-10 (2020) · Zbl 1440.93137
[8] Deptula, P.; Bell, Z. I.; Zegers, F. M.; Licitra, R. A.; Dixon, W. E., Approximate optimal influence over an agent through an uncertain interaction dynamic, Automatica, 134, 1-13 (2021) · Zbl 1478.93023
[9] Deptula, P.; Chen, H. Y.; Licitra, R.; Rosenfeld, J. A.; Dixon, W. E., Approximate optimal motion planning to avoid unknown Moving Avoidance Regions, IEEE Transactions on Robotics, 32, 2, 414-430 (2020)
[10] Fisac, J. F.; Akametalu, A. K.; Zeilinger, M. N.; Kaynama, S.; Gillula, J.; Tomlin, C. J., A general safety framework for learning-based control in uncertain robotic systems, IEEE Transactions on Automatic Control, 64, 7, 2737-2752 (2019) · Zbl 1482.93720
[11] Greene, M. L.; Deptula, P.; Nivison, S.; Dixon, W. E., Sparse learning-based approximate dynamic programming with barrier constraints, IEEE Control Systems Letters, 4, 3, 743-748 (2020)
[12] Gros, S.; Zanon, M., Data-driven economic NMPC using reinforcement learning, IEEE Transactions on Automatic Control, 65, 2, 636-648 (2020) · Zbl 1533.93176
[13] Hewing, L.; Wabersich, K. P.; Menner, M.; Zeilinger, M. N., Learning-based model predictive control: Toward safe learning in control, Annual Review of Control, Robotics, and Autonomous Systems, 3, 269-296 (2020)
[14] Ioannou, P.; Fidan, B., Adaptive control tutorial (2006), SIAM · Zbl 1116.93001
[15] Jankovic, M., Robust control barrier functions for constrained stabilization of nonlinear systems, Automatica, 96, 359-367 (2018) · Zbl 1406.93091
[16] Kamalapurkar, R.; Rosenfeld, J. A.; Dixon, W. E., Efficient model-based reinforcement learning for approximate online optimal control, Automatica, 74, 247-258 (2016) · Zbl 1348.93167
[17] Kamalapurkar, R.; Walters, P.; Dixon, W. E., Model-based reinforcement learning for approximate optimal regulation, Automatica, 64, 94-104 (2016) · Zbl 1329.93051
[18] Kamalapurkar, R.; Walters, P.; Rosenfeld, J. A.; Dixon, W. E., Reinforcement learning for optimal feedback control: A Lyapunov-based approach (2018), Springer · Zbl 1403.49001
[19] Khalil, H. K., Nonlinear systems (2002), Prentice Hall · Zbl 1003.34002
[20] Kiumarsi, B.; Vamvoudakis, K. G.; Modares, H.; Lewis, F. L., Optimal and autonomous control using reinforcement learning: A survey, IEEE Transactions on Neural Networks Learning Systems, 29, 6, 2042-2062 (2017)
[21] Krstić, M.; Kanellakopoulos, I.; Kokotović, P., Nonlinear and adaptive control design (1995), John Wiley & Sons · Zbl 0763.93043
[22] Lewis, F. L.; Vrabie, D.; Vamvoudakis, K. G., Reinforcement learning and feedback control: Using natural decision methods to design optimal adaptive controllers, IEEE Control Systems, 32, 6, 76-105 (2012) · Zbl 1395.93584
[23] Li, Z., Kalabić, U., & Chu, T. (2018). Safe reinforcement learning: Learning with supervision using a constraint-admissible set. In Proc. Amer. control conf. (pp. 6390-6395).
[24] Liberzon, D., Switching in systems and control (2003), Birkhäuser: Birkhäuser Boston, MA · Zbl 1036.93001
[25] Lopez, B. T.; Slotine, J. J.; How, J. P., Robust adaptive control barrier functions: An adaptive and data-driven approach to safety, IEEE Control Systems Letters, 5, 3, 1031-1036 (2021)
[26] Mahmud, S. M. N., Hareland, K., Nivison, S. A., Bell, Z. I., & Kamalapurkar, R. (2021). A Safety Aware Model-Based Reinforcement Learning Framework for Systems with Uncertainties. In Proc. Amer. control conf. (pp. 1979-1984).
[27] Marvi, Z.; Kiumarsi, B., Safe reinforcement learning: A control barrier function optimization approach, International Journal of Robust and Nonlinear Control, 31, 6, 1923-1940 (2021) · Zbl 1526.93181
[28] Panagou, D.; Stipanovic, D. M.; Voulgaris, P. G., Distributed coordination control for multi-robot networks using Lyapunov-like barrier functions, IEEE Transactions on Automatic Control, 61, 3, 617-632 (2016) · Zbl 1359.93026
[29] Parikh, A.; Kamalapurkar, R.; Dixon, W. E., Integral concurrent learning: Adaptive control with parameter convergence using finite excitation, International Journal of Adaptive Control and Signal Processing, 33, 12, 1775-1787 (2019) · Zbl 1451.93200
[30] Rosenfeld, J. A.; Kamalapurkar, R.; Dixon, W. E., The state following (StaF) approximation method, IEEE Transactions on Neural Networks Learning Systems, 30, 6, 1716-1730 (2019)
[31] Sontag, E. D., Mathematical control theory: Deterministic finite dimensional systems (2013), Springer Science & Business Media
[32] Taylor, A. J., & Ames, A. D. (2020). Adaptive safety with control barrier functions. In Proc. Amer. control conf. (pp. 1399-1405).
[33] Taylor, A. J.; Singletary, A.; Yue, Y.; Ames, A., Learning for safety-critical control with control barrier functions, (Proc. conf. learning for dyn. and control. Proc. conf. learning for dyn. and control, PMLR, Vol. 120 (2020)), 708-717
[34] Tee, K. P.; Ge, S. S.; Tay, E. H., Barrier Lyapunov functions for the control of output-constrained nonlinear systems, Automatica, 45, 4, 918-927 (2009) · Zbl 1162.93346
[35] Vamvoudakis, K. G.; Lewis, F. L., Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem, Automatica, 46, 5, 878-888 (2010) · Zbl 1191.49038
[36] Vamvoudakis, K. G.; Miranda, M. F.; Hespanha, J. P., Asymptotically stable adaptive-optimal control algorithm with saturating actuators and relaxed persistence of excitation, IEEE Transactions on Neural Networks Learning Systems, 27, 11, 2386-2398 (2015)
[37] Wieland, P., & Allgöwer, F. (2007). Constructive Safety using Control Barrier Functions. In Proc. IFAC symp. nonlin. control syst..
[38] Willis, A. G.; Heath, W. P., Barrier function based model predictive control, Automatica, 40, 8, 1415-1422 (2004) · Zbl 1077.93024
[39] Yang, Y.; Vamvoudakis, K. G.; Modares, H., Safe reinforcement learning for dynamical games, International Journal of Robust and Nonlinear Control, 30, 9, 3706-3726 (2020) · Zbl 1466.91038
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.