Skip to main content
Log in

Online identification and control of PDEs via reinforcement learning methods

  • Published:
Advances in Computational Mathematics Aims and scope Submit manuscript

Abstract

We focus on the control of unknown partial differential equations (PDEs). The system dynamics is unknown, but we assume we are able to observe its evolution for a given control input, as typical in a reinforcement learning framework. We propose an algorithm based on the idea to control and identify on the fly the unknown system configuration. In this work, the control is based on the state-dependent Riccati approach, whereas the identification of the model on Bayesian linear regression. At each iteration, based on the observed data, we obtain an estimate of the a-priori unknown parameter configuration of the PDE and then we compute the control of the correspondent model. We show by numerical evidence the convergence of the method for infinite horizon control problems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Code availibility

The MATLAB source code for the implementations used to compute the presented results can be downloaded from https://github.com/alessandroalla/SDRE-RL upon request to the corresponding author.

Data availability

No data has been used in this paper.

References

  1. Alla, A., Kalise, D., Simoncini, V.: State-dependent Riccati equation feedback stabilization for nonlinear PDEs. Adv. Comput. Math. 49 (2023). https://doi.org/10.1007/s10444-022-09998-4

  2. Alla, A., Pacifico, A.: A pod approach to identify and control PDEs online through state dependent Riccati equations. Tech. Rep. arXiv:2402.08186 (2024)

  3. Altmüller, N., Grüne, L., Worthmann, K.: Receding horizon optimal control for the wave equation. In: 49th IEEE Conference on Decision and Control (CDC), pp. 3427–3432 (2010). https://doi.org/10.1109/CDC.2010.5717272

  4. Banks, H.T., Lewis, B.M., Tran, H.T.: Nonlinear feedback controllers and compensators: a state-dependent Riccati equation approach. Comput. Optim. Appl. 37(2), 177–218 (2007)

    Article  MathSciNet  Google Scholar 

  5. Bardi, M., Capuzzo-Dolcetta, I.: Optimal control and viscosity solutions of Hamilton-Jacobi-Bellman equations. Birkhauser (1997)

  6. Bellman, R.: The theory of dynamic programming. Bullet. American Math. Soc. 60(6), 503–515 (1954)

    Article  MathSciNet  Google Scholar 

  7. Bellman, R.: Adaptive control processes: a guided tour. Princeton University Press, Princeton, N.J. (1961)

    Book  Google Scholar 

  8. Benner, P., Heiland, J.: Exponential stability and stabilization of extended linearizations via continuous updates of Riccati-based feedback. Int. J. Robust Nonlinear Control 28(4), 1218–1232 (2018). https://doi.org/10.1002/rnc.3949

    Article  MathSciNet  Google Scholar 

  9. Bertsekas, D.: Reinforcement and optimal control. Athena Scientific (2019)

  10. Bertsekas, D.P.: Approximate dynamic programming (2008)

  11. Box, G.E., Tiao, G.C.: Bayesian inference in statistical analysis, vol. 40. John Wiley & Sons (2011)

  12. Brunton, S., Proctor, J., Kutz, J.: Discovering governing equations from data by sparse identification of nonlinear dynamical systems. Proceedings of the National Academy of Sciences of the United States of America 115, 3932–3937 (2016)

    Article  MathSciNet  Google Scholar 

  13. Casper, S., Fuertinger, D.H., Kotanko, P., Mechelli, L., Rohleff, J., Volkwein, S.: Data-driven modeling and control of complex dynamical systems arising in renal anemia therapy. In: Ehrhardt, M., Günther, M. (eds.) Progress in Industrial Mathematics at ECMI 2021, pp. 155–161. Springer International Publishing, Cham (2022)

    Chapter  Google Scholar 

  14. Cloutier, J.R.: State-dependent Riccati equation techniques: an overview. In: Proceedings of the 1997 American Control Conference (Cat. No.97CH36041), vol. 2, pp. 932–936 vol.2 (1997)

  15. Falcone, M., Ferretti, R.: Semi-Lagrangian approximation schemes for linear and Hamilton—Jacobi equations. SIAM (2013)

  16. Freedman, D.A.: Statistical models: theory and practice. Cambridge University Press (2009)

  17. Grüne, L., Pannek, J.: Nonlinear model predictive control. Communications and Control Engineering Series. Springer, London (2011). Theory and algorithms

  18. Haarnoja, T., Zhou, A., Abbeel, P., Levine, S.: Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: International Conference on Machine Learning, pp. 1861–1870. PMLR (2018)

  19. Kaiser, E., Kutz, J.N., Brunton, S.L.: Sparse identification of nonlinear dynamics for model predictive control in the low-data limit. Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences 474(2219), 20180335 (2018). https://doi.org/10.1098/rspa.2018.0335

    Article  MathSciNet  Google Scholar 

  20. Karniadakis, G., Kevrekidis, I., Lu, L., Perdikaris, P., Wang, S., Yang, L.: Physics-informed machine learning. Nat. Rev. Phys. 3, 686–707 (2021). https://doi.org/10.1038/s42254-021-00314-5

    Article  Google Scholar 

  21. Knoll, D.A., Keyes, D.E.: Jacobian-free Newton-Krylov methods: a survey of approaches and applications. J. Comput. Phys. 193(2), 357–397 (2004)

    Article  MathSciNet  Google Scholar 

  22. Krstic, M., Smyshlyaev, A.: Adaptive control of PDEs. IFAC Proceedings Volumes, 9th IFAC Workshop on Adaptation and Learning in Control and Signal Processing 40(13), 20–31 (2007). https://doi.org/10.3182/20070829-3-RU-4911.00004

  23. LeVeque, R.J.: Finite difference methods for ordinary and partial differential equations. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA (2007). Steady-state and time-dependent problems. https://doi.org/10.1137/1.9780898717839

  24. Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. In: 4th International Conference on Learning Representations (ICLR) (2016)

  25. Martinsen, A.B., Lekkas, A.M., Gros, S.: Combining system identification with reinforcement learning-based MPC. IFAC-PapersOnLine 53(2), 8130–8135 (2020). https://doi.org/10.1016/j.ifacol.2020.12.2294. 21st IFAC World Congress

  26. Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., Graves, A., Riedmiller, M., Fidjeland, A.K., Ostrovski, G., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)

    Article  Google Scholar 

  27. Zolman, N., Fasel, U., Kutz, J.N. and Brunton, S.L.: SINDy-RL: interpretable and efficient model-based reinforcement learning. Tech. Rep. (2024). arXiv:2403.09110

  28. Pacifico, A., Pesare, A., Falcone, M.: A new algorithm for the LQR problem with partially unknown dynamics. In: Lirkov, I., Margenov, S. (eds.) Large-Scale Scientific Computing, pp. 322–330. Springer International Publishing, Cham (2022)

    Chapter  Google Scholar 

  29. Powell, W.B.: Approximate dynamic programming: solving the curses of dimensionality, vol. 703. John Wiley & Sons (2007)

  30. Powell, W.B.: From reinforcement learning to optimal control: a unified framework for sequential decisions. In: Handbook of Reinforcement Learning and Control, pp. 29–74. Springer (2021)

  31. Raissi, M., Perdikaris, P., Karniadakis, G.: Physics-informed neural networks: a deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. J. Computat. Phys. 378, 686–707 (2019). https://doi.org/10.1016/j.jcp.2018.10.045

    Article  MathSciNet  Google Scholar 

  32. Rasmussen, C., Williams, C.: Gaussian processes for machine learning. Adaptive Computation and Machine Learning. MIT Press, Cambridge, MA, USA (2006)

    Google Scholar 

  33. Rossi, P.E., Allenby, G.M., McCulloch, R.: Bayesian statistics and marketing. John Wiley & Sons (2012)

  34. Rudy, S., Alla, A., Brunton, S.L., Kutz, J.N.: Data-driven identification of parametric partial differential equations. SIAM J. Appl. Dynamical Syst. 18(2), 643–660 (2019). https://doi.org/10.1137/18M1191944

    Article  MathSciNet  Google Scholar 

  35. Rudy, S., Brunton, S., Proctor, J., Kutz, J.: Data-driven discovery of partial differential equations. Sci. Adv. 3 (2017)

  36. Rummery, G.A., Niranjan, M.: On-line Q-learning using connectionist systems, vol. 37. Citeseer (1994)

  37. Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International conference on machine learning, pp. 1889–1897. PMLR (2015)

  38. Sutton, R.S.: Learning to predict by the methods of temporal differences. Mach. Learn. 3(1), 9–44 (1988)

    Article  Google Scholar 

  39. Sutton, R.S., Barto, A.G.: Reinforcement learning: an introduction, vol. 1, first edn. MIT Press, Cambridge, MA (1998)

  40. Sutton, R.S., Barto, A.G.: Reinforcement learning: an introduction, 2nd edn. MIT Press, Cambridge, MA (2018)

    Google Scholar 

  41. Watkins, C., Hellaby, J.C.: Learning from delayed rewards (1989)

Download references

Acknowledgements

The authors want to express their deep gratitude to Maurizio Falcone. Thanks to him the authors met up and started to collaborate on this project.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alessandro Alla.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Additional information

Communicated by: Stefan Volkwein

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

A. Alla and A. Pacifico are members of the INdAM-GNCS activity group. A. Alla is part of INdAM - GNCS Project “Metodi numerici innovativi per equazioni di Hamilton-Jacobi” (CUP_E53C23001670001). The work of A.A. was carried out within the “Data-driven discovery and control of multi-scale interacting artificial agent systems,” and received funding from the European Union Next-GenerationEU -National Recov-ery and Resilience Plan (NRRP) - MISSION 4 COMPONENT 2, INVES-TIMENT 1.1 Fondo per il Programma Nazionale di Ricerca e Progetti di Rilevante Interesse Nazionale (PRIN) - Project Code P2022JC95T, CUP H53D23008920001. The work of M. Palladino is partially funded by the University of L’Aquila Starting Project Grant “Optimal Control and Applications,” and by INdAM-GNAMPA project, n. CUP_E53C22001930001

Andrea Pesare is an Independent Researcher by the time this manuscript is processed for publication

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Alla, A., Pacifico, A., Palladino, M. et al. Online identification and control of PDEs via reinforcement learning methods. Adv Comput Math 50, 85 (2024). https://doi.org/10.1007/s10444-024-10167-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10444-024-10167-y

Keywords

Mathematics Subject Classification (2010)

Navigation