×

Q-learning based tracking control with novel finite-horizon performance index. (English) Zbl 1542.93162

Summary: A data-driven method is designed to realize the model-free finite-horizon optimal tracking control (FHOTC) of unknown linear discrete-time systems based on Q-learning in this paper. First, a novel finite-horizon performance index (FHPI) that only depends on the next-step tracking error is introduced. Then, an augmented system is formulated, which incorporates with the system model and the trajectory model. Based on the novel FHPI, a derivation of the augmented time-varying Riccati equation (ATVRE) is provided. We present a data-driven FHOTC method that uses Q-learning to optimize the defined time-varying Q-function. This allows us to estimate the solutions of the ATVRE without the system dynamics. Finally, the validity and features of the proposed Q-learning-based FHOTC method are demonstrated by means of conducting comparative simulation studies.

MSC:

93C05 Linear systems in control theory
93C55 Discrete-time control/observation systems
68T05 Learning and adaptive systems in artificial intelligence
Full Text: DOI

References:

[1] Huang, Z.; Bai, W.; Li, T.; Long, Y.; Chen, C. P.; Liang, H.; Yang, H., Adaptive reinforcement learning optimal tracking control for strict-feedback nonlinear systems with prescribed performance, Inf. Sci., 621, 407-423, 2023 · Zbl 1536.93424
[2] Mu, C.; Zhang, Y.; Gao, Z.; Sun, C., ADP-based robust tracking control for a class of nonlinear systems with unmatched uncertainties, IEEE Trans. Syst. Man Cybern. Syst., 50, 11, 4056-4067, 2020
[3] Pan, Y.; Fu, S.; Wang, J.; Zhang, W., Optimal output tracking of Boolean control networks, Inf. Sci., 626, 524-536, 2023 · Zbl 1536.93392
[4] Sutton, R. S.; Barto, A. G., Reinforcement Learning: An Introduction, 2018, MIT Press · Zbl 1407.68009
[5] Watkins, C. J.; Dayan, P., Q-learning, Mach. Learn., 8, 3-4, 279-292, 1992 · Zbl 0773.68062
[6] Li, M.; Sun, J.; Zhang, H.; Ming, Z., Based on Q-learning optimal tracking control schemes for linear Itô stochastic systems with Markovian jumps, IEEE Trans. Circuits Syst. II, Express Briefs, 70, 3, 1094-1098, 2023
[7] Kiumarsi, B.; Lewis, F. L.; Modares, H.; Karimpour, A.; Naghibi-Sistani, M.-B., Reinforcement Q-learning for optimal tracking control of linear discrete-time systems with unknown dynamics, Automatica, 50, 4, 1167-1175, 2014 · Zbl 1417.93134
[8] Mu, C.; Zhao, Q.; Sun, C.; Gao, Z., An ADDHP-based Q-learning algorithm for optimal tracking control of linear discrete-time systems with unknown dynamics, Appl. Soft Comput., 82, Article 105593 pp., 2019
[9] Wen, X.; Shi, H.; Su, C.; Jiang, X.; Li, P.; Yu, J., Novel data-driven two-dimensional Q-learning for optimal tracking control of batch process with unknown dynamics, ISA Trans., 125, 10-21, 2022
[10] Rizvi, S. A.A.; Pertzborn, A. J.; Lin, Z., Reinforcement learning based optimal tracking control under unmeasurable disturbances with application to HVAC systems, IEEE Trans. Neural Netw. Learn. Syst., 33, 12, 7523-7533, 2022
[11] Zhao, J.; Yang, C.; Gao, W.; Zhou, L., Reinforcement learning and optimal setpoint tracking control of linear systems with external disturbances, IEEE Trans. Ind. Inform., 18, 11, 7770-7779, 2022
[12] Shi, H.; Yang, C.; Jiang, X.; Su, C.; Li, P., Novel two-dimensional off-policy Q-learning method for output feedback optimal tracking control of batch process with unknown dynamics, J. Process Control, 113, 29-41, 2022
[13] Shi, H.; Wen, X.; Jiang, X.; Su, C., Two-dimensional model-free optimal tracking control for batch processes with packet loss, IEEE Trans. Control Netw. Syst., 10, 2, 1032-1045, 2023
[14] Wang, D.; Ren, J.; Ha, M., Discounted linear Q-learning control with novel tracking cost and its stability, Inf. Sci., 626, 339-353, 2023 · Zbl 1536.93467
[15] Luo, B.; Liu, D.; Huang, T.; Wang, D., Model-free optimal tracking control via critic-only Q-learning, IEEE Trans. Neural Netw. Learn. Syst., 27, 10, 2134-2144, 2016
[16] Song, S.; Zhu, M.; Dai, X.; Gong, D., Model-free optimal tracking control of nonlinear input-affine discrete-time systems via an iterative deterministic Q-learning algorithm, IEEE Trans. Neural Netw. Learn. Syst., 35, 1, 999-1012, 2024
[17] Li, J.; Xiao, Z.; Li, P.; Cao, J., Robust optimal tracking control for multiplayer systems by off-policy Q-learning approach, Int. J. Robust Nonlinear, 31, 1, 87-106, 2021 · Zbl 1525.93053
[18] Peng, Y.; Chen, Q.; Sun, W., Reinforcement Q-learning algorithm for \(H_\infty\) tracking control of unknown discrete-time linear systems, IEEE Trans. Syst. Man Cybern. Syst., 50, 11, 4109-4122, 2020
[19] Wei, Q.; Song, R.; Li, B.; Lin, X., Self-Learning Optimal Control of Nonlinear Systems, 2018, Springer · Zbl 1403.49002
[20] Wang, D.; Liu, D.; Wei, Q., Finite-horizon neuro-optimal tracking control for a class of discrete-time nonlinear systems using adaptive dynamic programming approach, Neurocomputing, 78, 1, 14-22, 2012
[21] Heydari, A.; Balakrishnan, S., Fixed-final-time optimal tracking control of input-affine nonlinear systems, Neurocomputing, 129, 528-539, 2014
[22] Li, C.; Liu, D.; Li, H., Finite horizon optimal tracking control of partially unknown linear continuous-time systems using policy iteration, IET Control Theory Appl., 9, 12, 1791-1801, 2015
[23] Song, R.; Xie, Y.; Zhang, Z., Data-driven finite-horizon optimal tracking control scheme for completely unknown discrete-time nonlinear systems, Neurocomputing, 356, 206-216, 2019
[24] Zhang, H.; Cui, X.; Luo, Y.; Jiang, H., Finite-horizon \(H_\infty\) tracking control for unknown nonlinear systems with saturating actuators, IEEE Trans. Neural Netw. Learn. Syst., 29, 4, 1200-1212, 2018
[25] Possieri, C.; Incremona, G. P.; Calafiore, G. C.; Ferrara, A., An iterative data-driven linear quadratic method to solve nonlinear discrete-time tracking problems, IEEE Trans. Autom. Control, 66, 11, 5514-5521, 2021 · Zbl 1536.93517
[26] Wang, W.; Xie, X.; Feng, C., Model-free finite-horizon optimal tracking control of discrete-time linear systems, Appl. Math. Comput., 433, Article 127400 pp., 2022 · Zbl 1510.49032
[27] Li, C.; Ding, J.; Lewis, F. L.; Chai, T., A novel adaptive dynamic programming based on tracking error for nonlinear discrete-time systems, Automatica, 129, Article 109687 pp., 2021 · Zbl 1478.93321
[28] Simon, D., Optimal State Estimation: Kalman, H Infinity, and Nonlinear Approaches, 2006, John Wiley & Sons
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.