×

Model-free LQR design by Q-function learning. (English) Zbl 1485.93031

Summary: Reinforcement learning methods such as Q-learning have shown promising results in the model-free design of linear quadratic regulator (LQR) controllers for linear time-invariant (LTI) systems. However, challenges such as sample-efficiency, sensitivity to hyper-parameters, and compatibility with classical control paradigms limit the integration of such algorithms in critical control applications. This paper aims to take some steps towards bridging the well-known classical control requirements and learning algorithms by using optimization frameworks and properties of conic constraints. Accordingly, a new off-policy model-free approach is proposed for learning the Q-function and designing the discrete-time LQR controller. The design procedure is based on non-iterative semi-definite programs (SDP) with linear matrix inequality (LMI) constraints. It is sample-efficient, inherently robust to model uncertainties, and does not require an initial stabilizing controller. The proposed model-free approach is extended to distributed control of interconnected systems, as well. The performance of the presented design is evaluated on several stable and unstable synthetic systems. The data-driven control scheme is also implemented on the IEEE 39-bus New England power grid. The results confirm optimality, sample-efficiency, and satisfactory performance of the proposed approach in centralized and distributed design.

MSC:

93A15 Large-scale systems
93C05 Linear systems in control theory
93B52 Feedback control
49N10 Linear-quadratic optimal control problems
90C25 Convex programming
90C22 Semidefinite programming

Software:

CVX
Full Text: DOI

References:

[1] Alemzadeh, Siavash; Mesbahi, Mehran, Distributed Q-learning for dynamically decoupled systems, (2019 American control conference (ACC) (2019)), 772-777
[2] Azzollini, Ilario Antonio; Yu, Wenwu; Yuan, Shuai; Baldi, Simone, Adaptive leader-follower synchronization over heterogeneous and uncertain networks of linear systems without distributed observer, IEEE Transactions on Automatic Control (2020) · Zbl 1536.93406
[3] Babazadeh, Maryam, Regularization for optimal sparse control structures: a primal-dual framework, (2021 American control conference (ACC) (2021), IEEE) · Zbl 1427.93036
[4] Babazadeh, Maryam; Nobakhti, Amin, Sparsity promotion in state feedback controller design, IEEE Transactions on Automatic Control, 62, 8, 4066-4072 (2017) · Zbl 1373.93126
[5] Bellman, Richard, Dynamic programming (1957), Princeton University Press · Zbl 0077.13605
[6] Boyd, Stephen; El Ghaoui, Laurent; Feron, Eric; Balakrishnan, Venkataramanan, Linear matrix inequalities in system and control theory (1994), SIAM · Zbl 0816.93004
[7] Boyd, Stephen P.; Vandenberghe, Lieven, Convex optimization (2004), Cambridge University Press · Zbl 1058.90049
[8] Bradtke, Steven J., Ydstie, B. Erik, & Barto, Andrew G. (1994). Adaptive linear quadratic control using policy iteration. In Proceedings of 1994 American control conference, Vol. 3 (pp. 3475-3479).
[9] Buşoniu, Lucian; de Bruin, Tim; Tolić, Domagoj; Kober, Jens; Palunko, Ivana, Reinforcement learning for control: Performance, stability, and deep approximators, Annual Reviews in Control, 46, 8-28 (2018)
[10] Chow, Joe H.; Cheung, Kwok W., A toolbox for power system dynamics and control engineering education and research, IEEE Transactions on Power Systems, 7, 4, 1559-1564 (1992)
[11] De Persis, Claudio; Tesi, Pietro, Formulas for data-driven control: Stabilization, optimality, and robustness, IEEE Transactions on Automatic Control, 65, 3, 909-924 (2019) · Zbl 1533.93591
[12] Görges, Daniel, Distributed adaptive linear quadratic control using distributed reinforcement learning, IFAC-PapersOnLine, 52, 11, 218-223 (2019)
[13] Grant, Michael; Boyd, Stephen, CVX: Matlab software for disciplined convex programming, version 2.1 (2014), http://cvxr.com/cvx
[14] Jha, Sumit Kumar; Roy, Sayan Basu; Bhasin, Shubhendu, Data-driven adaptive LQR for completely unknown LTI systems, IFAC-PapersOnLine, 50, 1, 4156-4161 (2017)
[15] Jha, Sumit Kumar; Roy, Sayan Basu; Bhasin, Shubhendu, Direct adaptive optimal control for uncertain continuous-time LTI systems without persistence of excitation, IEEE Transactions on Circuits and Systems II: Express Briefs, 65, 12, 1993-1997 (2018)
[16] Lee, Donghwan; Hu, Jianghai, Primal-dual Q-learning framework for LQR design, IEEE Transactions on Automatic Control, 3756-3763 (2019) · Zbl 1482.93469
[17] Lewis, Frank L.; Vrabie, Draguna, Reinforcement learning and adaptive dynamic programming for feedback control, IEEE Circuits and Systems Magazine, 9, 32-50 (2009)
[18] Lin, Fu; Fardad, Makan; Jovanović, Mihailo R., Design of optimal sparse feedback gains via the alternating direction method of multipliers, IEEE Transactions on Automatic Control, 58, 9, 2426-2431 (2013) · Zbl 1369.93215
[19] Mania, Horia; Guy, Aurelia; Recht, Benjamin, Simple random search provides a competitive approach to reinforcement learning (2018), arXiv preprint arXiv:1803.07055
[20] Rami, Mustapha Ait; Zhou, Xun Yu, Linear matrix inequalities, Riccati equations, and indefinite stochastic linear quadratic controls, IEEE Transactions on Automatic Control, 45, 6, 1131-1143 (2000) · Zbl 0981.93080
[21] Sauer, Peter W.; Pai, M. A.; Chow, Joe H., Power systems dynamics and stability (2017), Wiley-IEEE Press
[22] da Silva, Gustavo R. Gonçalves; Bazanella, Alexandre S.; Lorenzini, Charles; Campestrini, Lucíola, Data-driven LQR control design, IEEE Control Systems Letters, 3, 1, 180-185 (2019)
[23] Strang, Gilbert, Linear algebra and its applications (1968) · Zbl 0184.38503
[24] Sutton, Richard S.; Barto, Andrew G., Reinforcement learning: An introduction (1998), MIT Press · Zbl 1407.68009
[25] Tu, Stephen; Recht, Benjamin, Least-squares temporal difference learning for the linear quadratic regulator, (International conference on machine learning (2018)), 5005-5014
[26] Umenberger, Jack; Schön, Thomas B., Learning convex bounds for linear quadratic control policy synthesis, (Advances in neural information processing systems (2018)), 9561-9572
[27] Willems, Jan C.; Rapisarda, Paolo; Markovsky, Ivan; De Moor, Bart L. M., A note on persistency of excitation, Systems & Control Letters, 54, 4, 325-329 (2005) · Zbl 1129.93362
[28] Yu, Wenwu; DeLellis, Pietro; Chen, Guanrong; Di Bernardo, Mario; Kurths, Jürgen, Distributed adaptive control of synchronization in complex networks, IEEE Transactions on Automatic Control, 57, 8, 2153-2158 (2012) · Zbl 1369.93321
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.