Abstract
We focus on the control of unknown partial differential equations (PDEs). The system dynamics is unknown, but we assume we are able to observe its evolution for a given control input, as typical in a reinforcement learning framework. We propose an algorithm based on the idea to control and identify on the fly the unknown system configuration. In this work, the control is based on the state-dependent Riccati approach, whereas the identification of the model on Bayesian linear regression. At each iteration, based on the observed data, we obtain an estimate of the a-priori unknown parameter configuration of the PDE and then we compute the control of the correspondent model. We show by numerical evidence the convergence of the method for infinite horizon control problems.
Similar content being viewed by others
Code availibility
The MATLAB source code for the implementations used to compute the presented results can be downloaded from https://github.com/alessandroalla/SDRE-RL upon request to the corresponding author.
Data availability
No data has been used in this paper.
References
Alla, A., Kalise, D., Simoncini, V.: State-dependent Riccati equation feedback stabilization for nonlinear PDEs. Adv. Comput. Math. 49 (2023). https://doi.org/10.1007/s10444-022-09998-4
Alla, A., Pacifico, A.: A pod approach to identify and control PDEs online through state dependent Riccati equations. Tech. Rep. arXiv:2402.08186 (2024)
Altmüller, N., Grüne, L., Worthmann, K.: Receding horizon optimal control for the wave equation. In: 49th IEEE Conference on Decision and Control (CDC), pp. 3427–3432 (2010). https://doi.org/10.1109/CDC.2010.5717272
Banks, H.T., Lewis, B.M., Tran, H.T.: Nonlinear feedback controllers and compensators: a state-dependent Riccati equation approach. Comput. Optim. Appl. 37(2), 177–218 (2007)
Bardi, M., Capuzzo-Dolcetta, I.: Optimal control and viscosity solutions of Hamilton-Jacobi-Bellman equations. Birkhauser (1997)
Bellman, R.: The theory of dynamic programming. Bullet. American Math. Soc. 60(6), 503–515 (1954)
Bellman, R.: Adaptive control processes: a guided tour. Princeton University Press, Princeton, N.J. (1961)
Benner, P., Heiland, J.: Exponential stability and stabilization of extended linearizations via continuous updates of Riccati-based feedback. Int. J. Robust Nonlinear Control 28(4), 1218–1232 (2018). https://doi.org/10.1002/rnc.3949
Bertsekas, D.: Reinforcement and optimal control. Athena Scientific (2019)
Bertsekas, D.P.: Approximate dynamic programming (2008)
Box, G.E., Tiao, G.C.: Bayesian inference in statistical analysis, vol. 40. John Wiley & Sons (2011)
Brunton, S., Proctor, J., Kutz, J.: Discovering governing equations from data by sparse identification of nonlinear dynamical systems. Proceedings of the National Academy of Sciences of the United States of America 115, 3932–3937 (2016)
Casper, S., Fuertinger, D.H., Kotanko, P., Mechelli, L., Rohleff, J., Volkwein, S.: Data-driven modeling and control of complex dynamical systems arising in renal anemia therapy. In: Ehrhardt, M., Günther, M. (eds.) Progress in Industrial Mathematics at ECMI 2021, pp. 155–161. Springer International Publishing, Cham (2022)
Cloutier, J.R.: State-dependent Riccati equation techniques: an overview. In: Proceedings of the 1997 American Control Conference (Cat. No.97CH36041), vol. 2, pp. 932–936 vol.2 (1997)
Falcone, M., Ferretti, R.: Semi-Lagrangian approximation schemes for linear and Hamilton—Jacobi equations. SIAM (2013)
Freedman, D.A.: Statistical models: theory and practice. Cambridge University Press (2009)
Grüne, L., Pannek, J.: Nonlinear model predictive control. Communications and Control Engineering Series. Springer, London (2011). Theory and algorithms
Haarnoja, T., Zhou, A., Abbeel, P., Levine, S.: Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: International Conference on Machine Learning, pp. 1861–1870. PMLR (2018)
Kaiser, E., Kutz, J.N., Brunton, S.L.: Sparse identification of nonlinear dynamics for model predictive control in the low-data limit. Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences 474(2219), 20180335 (2018). https://doi.org/10.1098/rspa.2018.0335
Karniadakis, G., Kevrekidis, I., Lu, L., Perdikaris, P., Wang, S., Yang, L.: Physics-informed machine learning. Nat. Rev. Phys. 3, 686–707 (2021). https://doi.org/10.1038/s42254-021-00314-5
Knoll, D.A., Keyes, D.E.: Jacobian-free Newton-Krylov methods: a survey of approaches and applications. J. Comput. Phys. 193(2), 357–397 (2004)
Krstic, M., Smyshlyaev, A.: Adaptive control of PDEs. IFAC Proceedings Volumes, 9th IFAC Workshop on Adaptation and Learning in Control and Signal Processing 40(13), 20–31 (2007). https://doi.org/10.3182/20070829-3-RU-4911.00004
LeVeque, R.J.: Finite difference methods for ordinary and partial differential equations. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA (2007). Steady-state and time-dependent problems. https://doi.org/10.1137/1.9780898717839
Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. In: 4th International Conference on Learning Representations (ICLR) (2016)
Martinsen, A.B., Lekkas, A.M., Gros, S.: Combining system identification with reinforcement learning-based MPC. IFAC-PapersOnLine 53(2), 8130–8135 (2020). https://doi.org/10.1016/j.ifacol.2020.12.2294. 21st IFAC World Congress
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., Graves, A., Riedmiller, M., Fidjeland, A.K., Ostrovski, G., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)
Zolman, N., Fasel, U., Kutz, J.N. and Brunton, S.L.: SINDy-RL: interpretable and efficient model-based reinforcement learning. Tech. Rep. (2024). arXiv:2403.09110
Pacifico, A., Pesare, A., Falcone, M.: A new algorithm for the LQR problem with partially unknown dynamics. In: Lirkov, I., Margenov, S. (eds.) Large-Scale Scientific Computing, pp. 322–330. Springer International Publishing, Cham (2022)
Powell, W.B.: Approximate dynamic programming: solving the curses of dimensionality, vol. 703. John Wiley & Sons (2007)
Powell, W.B.: From reinforcement learning to optimal control: a unified framework for sequential decisions. In: Handbook of Reinforcement Learning and Control, pp. 29–74. Springer (2021)
Raissi, M., Perdikaris, P., Karniadakis, G.: Physics-informed neural networks: a deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. J. Computat. Phys. 378, 686–707 (2019). https://doi.org/10.1016/j.jcp.2018.10.045
Rasmussen, C., Williams, C.: Gaussian processes for machine learning. Adaptive Computation and Machine Learning. MIT Press, Cambridge, MA, USA (2006)
Rossi, P.E., Allenby, G.M., McCulloch, R.: Bayesian statistics and marketing. John Wiley & Sons (2012)
Rudy, S., Alla, A., Brunton, S.L., Kutz, J.N.: Data-driven identification of parametric partial differential equations. SIAM J. Appl. Dynamical Syst. 18(2), 643–660 (2019). https://doi.org/10.1137/18M1191944
Rudy, S., Brunton, S., Proctor, J., Kutz, J.: Data-driven discovery of partial differential equations. Sci. Adv. 3 (2017)
Rummery, G.A., Niranjan, M.: On-line Q-learning using connectionist systems, vol. 37. Citeseer (1994)
Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International conference on machine learning, pp. 1889–1897. PMLR (2015)
Sutton, R.S.: Learning to predict by the methods of temporal differences. Mach. Learn. 3(1), 9–44 (1988)
Sutton, R.S., Barto, A.G.: Reinforcement learning: an introduction, vol. 1, first edn. MIT Press, Cambridge, MA (1998)
Sutton, R.S., Barto, A.G.: Reinforcement learning: an introduction, 2nd edn. MIT Press, Cambridge, MA (2018)
Watkins, C., Hellaby, J.C.: Learning from delayed rewards (1989)
Acknowledgements
The authors want to express their deep gratitude to Maurizio Falcone. Thanks to him the authors met up and started to collaborate on this project.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no competing interests.
Additional information
Communicated by: Stefan Volkwein
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
A. Alla and A. Pacifico are members of the INdAM-GNCS activity group. A. Alla is part of INdAM - GNCS Project “Metodi numerici innovativi per equazioni di Hamilton-Jacobi” (CUP_E53C23001670001). The work of A.A. was carried out within the “Data-driven discovery and control of multi-scale interacting artificial agent systems,” and received funding from the European Union Next-GenerationEU -National Recov-ery and Resilience Plan (NRRP) - MISSION 4 COMPONENT 2, INVES-TIMENT 1.1 Fondo per il Programma Nazionale di Ricerca e Progetti di Rilevante Interesse Nazionale (PRIN) - Project Code P2022JC95T, CUP H53D23008920001. The work of M. Palladino is partially funded by the University of L’Aquila Starting Project Grant “Optimal Control and Applications,” and by INdAM-GNAMPA project, n. CUP_E53C22001930001
Andrea Pesare is an Independent Researcher by the time this manuscript is processed for publication
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Alla, A., Pacifico, A., Palladino, M. et al. Online identification and control of PDEs via reinforcement learning methods. Adv Comput Math 50, 85 (2024). https://doi.org/10.1007/s10444-024-10167-y
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10444-024-10167-y
Keywords
- Reinforcement learning
- System identification
- Stabilization of PDEs
- State-dependent Riccati equations
- Bayesian linear regression
- Numerical approximation