×

DeepBayes – an estimator for parameter estimation in stochastic nonlinear dynamical models. (English) Zbl 1539.93189

Summary: Stochastic nonlinear dynamical systems are ubiquitous in modern, real-world applications. Yet, estimating the unknown parameters of stochastic, nonlinear dynamical models remains a challenging problem. The majority of existing methods employ maximum likelihood or Bayesian estimation. However, these methods suffer from some limitations, most notably the substantial computational time for inference coupled with limited flexibility in application. In this work, we propose DeepBayes estimators that leverage the power of deep recurrent neural networks. The method consists of first training a recurrent neural network to minimize the mean-squared estimation error over a set of synthetically generated data using models drawn from the model set of interest. The a priori trained estimator can then be used directly for inference by evaluating the network with the estimation data. The deep recurrent neural network architectures can be trained offline and ensure significant time savings during inference. We experiment with two popular recurrent neural networks – long short term memory network (LSTM) and gated recurrent unit (GRU). We demonstrate the applicability of our proposed method on different example models and perform detailed comparisons with state-of-the-art approaches. We also provide a study on a real-world nonlinear benchmark problem. The experimental evaluations show that the proposed approach is asymptotically as good as the Bayes estimator.

MSC:

93E12 Identification in stochastic control theory
93E10 Estimation and detection in stochastic control theory
93C10 Nonlinear systems in control theory

References:

[1] Abdalmoaty, M. R.-H.; Hjalmarsson, H., Linear prediction error methods for stochastic nonlinear models, Automatica, 105, 49-63, (2019) · Zbl 1429.93362
[2] Abdalmoaty, M. R.-H.; Hjalmarsson, H., Identification of stochastic nonlinear models using optimal estimating functions, Automatica, 119, Article 109055 pp., (2020) · Zbl 1451.93401
[3] Andrieu, C.; Doucet, A.; Holenstein, R., Particle Markov chain monte carlo methods, Journal of the Royal Statistical Society. Series B. Statistical Methodology, 72, 3, 269-342, (2010) · Zbl 1411.65020
[4] Caines, P.; Rissanen, J., Maximum likelihood estimation of parameters in multivariate Gaussian stochastic processes (corresp.), IEEE Transactions on Information Theory, 20, 1, 102-104, (1974) · Zbl 0283.62085
[5] Chen, G., A gentle tutorial of recurrent neural network with error backpropagation, (2016), arXiv preprint arXiv:1610.02583
[6] Chib, S.; Greenberg, E., Understanding the Metropolis-Hastings algorithm, The American Statistician, 49, 4, 327-335, (1995)
[7] Cho, K.; van Merrienboer, B.; Bahdanau, D.; Bengio, Y., On the properties of neural machine translation: encoder-decoder approaches, (2014), arXiv:1409.1259 [cs, stat]
[8] Fraccaro, M., Deep latent variable models for sequential data, (2018), English
[9] Garatti, S.; Bittanti, S., A new paradigm for parameter estimation in system modeling, International Journal of Adaptive Control and Signal Processing, 27, 8, 667-687, (2013) · Zbl 1284.93225
[10] Gedon, D.; Wahlström, N.; Schön, T. B.; Ljung, L., Deep state space models for nonlinear system identification, IFAC-PapersOnLine, 54, 7, 481-486, (2021)
[11] Ghahramani, Z.; Roweis, S. T., Learning nonlinear dynamical systems using an EM algorithm, (Advances in neural information processing systems, (1999)), 431-437
[12] Girin, L.; Leglaive, S.; Bie, X.; Diard, J.; Hueber, T.; Alameda-Pineda, X., Dynamical variational autoencoders: A comprehensive review, Foundations and Trends in Machine Learning, 15, 1-2, 1-175, (2021) · Zbl 1491.68153
[13] Goodfellow, I., NeurIPS 2016 tutorial: Generative adversarial networks, (2016), arXiv preprint arXiv:1701.00160
[14] Goodfellow, I.; Bengio, Y.; Courville, A., Deep learning, (2016), MIT Press · Zbl 1373.68009
[15] Hastings, W. K., Monte Carlo sampling methods using Markov chains and their applications, (1970) · Zbl 0219.65008
[16] Hewamalage, H.; Bergmeir, C.; Bandara, K., Recurrent neural networks for time series forecasting: Current status and future directions, International Journal of Forecasting, 37, 1, 388-427, (2021)
[17] Hochreiter, S.; Schmidhuber, J., Long Short-Term Memory, Neural Computation, 9, 8, 1735-1780, (1997)
[18] Kantas, N.; Doucet, A.; Singh, S. S.; Maciejowski, J.; Chopin, N., On particle methods for parameter estimation in state-space models, Statistical Science, 30, 3, 328-351, (2015) · Zbl 1332.62096
[19] Karl, M., Soelch, M., Bayer, J., & van der Smagt, P. (2017). Deep Variational Bayes Filters: Unsupervised Learning of State Space Models from Raw Data. In International conference on learning representations.
[20] Karpathy, A.; Johnson, J.; Fei-Fei, L., Visualizing and understanding recurrent networks, (2015), arXiv preprint arXiv:1506.02078
[21] Kingma, D. P.; Ba, J., Adam: A method for stochastic optimization, (2014), arXiv preprint arXiv:1412.6980
[22] Lindsten, F., An efficient stochastic approximation EM algorithm using conditional particle filters, (2013 IEEE international conference on acoustics, speech and signal processing, (2013), IEEE), 6274-6278
[23] Ljung, L., System identification toolbox, (The Matlab user’s guide, (1988))
[24] Ljung, L., Perspectives on system identification, Annual Reviews in Control, 34, 1, 1-12, (2010)
[25] Murray, R.; Livingston, S., Python control systems library, (2018), http://python-control.readthedocs.io/en/latest/index.html
[26] Ninness, B.; Henriksen, S., Bayesian system identification via Markov chain Monte Carlo techniques, Automatica, 46, 1, 40-51, (2010) · Zbl 1214.93115
[27] Ninness, B.; Wills, A.; Mills, A., UNIT: A freely available system identification toolbox, Control Engineering Practice, 21, 5, 631-644, (2013)
[28] Pascanu, R., Gulcehre, C., Cho, K., & Bengio, Y. (2014). How to construct deep recurrent neural networks. In International conference on learning representations.
[29] Paszke, A., PyTorch: An imperative style, high-performance deep learning library, (2019), arXiv preprint arXiv:1912.01703
[30] Pillonetto, G.; Aravkin, A.; Gedon, D.; Ljung, L.; Ribeiro, A. H.; Schön, T. B., Deep networks for system identification: a survey, (2023), arXiv preprint arXiv:2301.12832
[31] Pillonetto, G.; Dinuzzo, F.; Chen, T.; De Nicolao, G.; Ljung, L., Kernel methods in system identification, machine learning and function estimation: A survey, Automatica, 50, 3, 657-682, (2014) · Zbl 1298.93342
[32] Pintelon, R.; Schoukens, J., System identification: a frequency domain approach, (2012), John Wiley & Sons
[33] Rossi, S.; Michiardi, P.; Filippone, M., Good initializations of variational Bayes for deep models, (International conference on machine learning, (2019), PMLR), 5487-5497
[34] Rumelhart, D. E.; Hinton, G. E.; Williams, R. J., Learning representations by back-propagating errors, Nature, 323, 6088, 533-536, (1986) · Zbl 1369.68284
[35] Schön, T. B.; Lindsten, F.; Dahlin, J.; Wågberg, J.; Naesseth, C. A.; Svensson, A., Sequential Monte Carlo methods for system identification, IFAC-PapersOnLine, 48, 28, 775-786, (2015)
[36] Schön, T. B.; Wills, A.; Ninness, B., System identification of nonlinear state-space models, Automatica, 47, 1, 39-49, (2011) · Zbl 1209.93155
[37] Sjöberg, J.; Hjalmarsson, H.; Ljung, L., Neural networks in system identification, IFAC Proceedings Volumes, 27, 8, 359-382, (1994)
[38] Sjöberg, J.; Zhang, Q.; Ljung, L.; Benveniste, A.; Delyon, B.; Glorennec, P.-Y., Nonlinear black-box modeling in system identification: a unified overview, Automatica, 31, 12, 1691-1724, (1995) · Zbl 0846.93018
[39] Taghavi, E.; Lindsten, F.; Svensson, L.; Schön, T. B., Adaptive stopping for fast particle smoothing, (2013 IEEE international conference on acoustics, speech and signal processing, (2013), IEEE), 6293-6297
[40] Tang, D., Qin, B., & Liu, T. (2015). Document modeling with gated recurrent neural network for sentiment classification. In Proceedings of the 2015 conference on empirical methods in natural language processing (pp. 1422-1432).
[41] Wei, G. C.; Tanner, M. A., A Monte Carlo implementation of the EM algorithm and the poor man’s data augmentation algorithms, Journal of the American Statistical Association, 85, 411, 699-704, (1990)
[42] Wenzel, F.; Roth, K.; Veeling, B. S.; Świkatkowski, J.; Tran, L.; Mandt, S., How good is the Bayes posterior in deep neural networks really?, (2020), arXiv preprint arXiv:2002.02405
[43] Wigren, T., Recursive identification based on nonlinear state space models applied to drum-boiler dynamics with nonlinear output equations, (Proceedings of the 2005, american control conference, 2005, (2005), IEEE), 5066-5072
[44] Wigren, T.; Schoukens, M., Coupled electric drives data set and reference models, (2017), Department of Information Technology, Uppsala Universitet
[45] Zancato, L.; Chiuso, A., A novel deep neural network architecture for non-linear system identification, IFAC-PapersOnLine, 54, 7, 186-191, (2021)
[46] Zhu, Y., Multivariable system identification for process control, (2001), Elsevier
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.