Document Zbl 1435.68275

Mattos, César Lincoln C.; Barreto, Guilherme A.

A stochastic variational framework for recurrent Gaussian processes models. (English) Zbl 1435.68275

Neural Netw. 112, 54-72 (2019).

Summary: Gaussian Processes (GPs) models have been successfully applied to the problem of learning from sequential observations. In such context, the family of Recurrent Gaussian Processes (RGPs) have been recently introduced with a specifically designed structure to handle dynamical data. However, RGPs present a limitation shared by most GP approaches: they become computationally infeasible when facing very large datasets. In the present work, with the aim of improving scalability, we modify the original variational approach used with RGPs in order to enable inference via stochastic mini-batch optimization, giving rise to the Stochastic Recurrent Variational Bayes (S-REVARB) framework. We review recent related literature and comprehensively contextualize it with our approach. Moreover, we propose two learning procedures, the Local and Global S-REVARB algorithms, which prevent computational costs from scaling with the number of training samples. The global variant permits even greater scalability by also preventing the number of variational parameters from increasing with the training set, through the use of neural networks as sequential recognition models. The proposed framework is evaluated in the task of dynamical system identification for large scale datasets, a scenario not readily supported by the standard batch inference for RGPs. The promising results indicate that the S-REVARB framework opens up the possibility of applying powerful hierarchical recurrent GP-based models to massive sequential data.

MSC:

68T05	Learning and adaptive systems in artificial intelligence
60G15	Gaussian processes
68T09	Computational aspects of data analysis and big data

Keywords:

Gaussian processes; dynamical modeling; variational inference; stochastic learning

Software:

George; Adam; Wiener-Hammerstein Benchmark; Spearmint; R

Cite Review PDF

Full Text: DOI

References:

[1]	Al-Shedivat, M.; Wilson, A. G.; Saatchi, Y.; Hu, Z.; Xing, E. P., Learning scalable deep kernels with recurrent structure, Journal of Machine Learning Research (JMLR), 18, 1, 2850-2886 (2017) · Zbl 1434.68390
[2]	Amari, S. I., Natural gradient works efficiently in learning, Neural Computation, 10, 2, 251-276 (1998)
[3]	Ambikasaran, S.; Foreman-Mackey, D.; Greengard, L.; Hogg, D. W.; O’Neil, M., Fast direct methods for Gaussian processes, IEEE Transactions on Pattern Analysis and Machine Intelligence, 38, 2, 252-265 (2016)
[4]	Ažman, K.; Kocijan, J., Dynamical systems identification using Gaussian process models with incorporated local models, Engineering Applications of Artificial Intelligence, 24, 2, 398-408 (2011)
[5]	Barber, D., Bayesian reasoning and machine learning (2012), Cambridge University Press: Cambridge University Press Cambridge, UK · Zbl 1267.68001
[6]	Bijl, H., Schön, T. B., van Wingerden, J. W., & Verhaegen, M. (2016). Online sparse Gaussian process training with input noise, arXiv preprint arXiv:1601.08068https://arxiv.org/abs/1601.08068; Bijl, H., Schön, T. B., van Wingerden, J. W., & Verhaegen, M. (2016). Online sparse Gaussian process training with input noise, arXiv preprint arXiv:1601.08068https://arxiv.org/abs/1601.08068
[7]	Blei, D. M.; Kucukelbir, A.; McAuliffe, J. D., Variational inference: A review for statisticians, Journal of the American Statistical Association, 112, 518, 859-877 (2017)
[8]	Bottou, L., Online learning and stochastic approximations, (Saad, D., On-line learning in neural networks (1999), Cambridge University Press), 9-42 · Zbl 0968.68127
[9]	Bottou, L., Stochastic learning, (Advanced lectures on machine learning (2004), Springer: Springer New York, NY, USA), 146-168 · Zbl 1120.68426
[10]	Brahim-Belhouari, S.; Bermak, A., Gaussian process for nonstationary time series prediction, Computational Statistics & Data Analysis, 47, 4, 705-712 (2004) · Zbl 1429.62420
[11]	Bui, T. D.; Hernández-Lobato, D.; Li, Y.; Hernández-Lobato, J. M.; Turner, R. E., Deep Gaussian processes for regression using approximate expectation propagation, Proceedings of The 33rd international conference on machine learning (2016), JMLR.org: JMLR.org New York City, NY, USA
[12]	Bui, T. D.; Turner, R. E., Tree-structured Gaussian process approximations, Advances in Neural Information Processing Systems 27, 2213-2221 (2014), NIPS Foundation: NIPS Foundation Montréal, Canada
[13]	Bui, T. D.; Turner, R. E., Stochastic variational inference for Gaussian process latent variable models using back constraints, NIPS workshop on black box learning and inference (2015), NIPS Foundation: NIPS Foundation Montréal, Canada
[14]	Carli, F. P.; Chiuso, A.; Pillonetto, G., Efficient algorithms for large scale linear system identification using stable spline estimators, Proceedings of the 16th IFAC symposium on system identification, Vol. 45, 119-124 (2012), Elsevier: Elsevier Brussels, Belgium
[15]	Cheng, Y.; Wang, Y.; Camps, O.; Sznaier, M., The interplay between big data and sparsity in systems identification: some lessons from machine learning, Proceedings of the 17th IFAC symposium on system identification, Vol. 48, 1285-1292 (2015), Elsevier: Elsevier Beijing, China
[16]	Dai, Z.; Damianou, A.; González, J.; Lawrence, N., Variational auto-encoded deep Gaussian processes, International conference on learning representations (2016), ICLR: ICLR San Juan Puerto Rico, URL https://arxiv.org/pdf/1511.06455
[17]	Dai, Z., Damianou, A., Hensman, J., & Lawrence, N. (2014). Gaussian process models with parallelization and GPU acceleration, arXiv preprint arXiv:1410.4984https://arxiv.org/pdf/1410.4984; Dai, Z., Damianou, A., Hensman, J., & Lawrence, N. (2014). Gaussian process models with parallelization and GPU acceleration, arXiv preprint arXiv:1410.4984https://arxiv.org/pdf/1410.4984
[18]	Damianou, A., Deep Gaussian processes and variational propagation of uncertainty (2015), University of Sheffield, (Ph.D. thesis)
[19]	Deisenroth, M. P.; Ng, J. W., Distributed Gaussian processes, Proceedings of the 32nd international conference on machine learning, 1481-1490 (2015), JMLR.org: JMLR.org Lille, France
[20]	Deisenroth, M. P.; Turner, R. D.; Huber, M. F.; Hanebeck, U. D.; Rasmussen, C. E., Robust filtering and smoothing with Gaussian processes, IEEE Transactions on Automatic Control, 57, 7, 1865-1871 (2012) · Zbl 1369.93678
[21]	Dezfouli, A.; Bonilla, E. V., Scalable inference for Gaussian process models with black-box likelihoods, Advances in neural information processing systems 28, 1414-1422 (2015), NIPS Foundation: NIPS Foundation Montréal, Canada
[22]	Eleftheriadis, S.; Nicholson, T.; Deisenroth, M.; Hensman, J., Identification of Gaussian process state space models, Advances in neural information processing systems, 5309-5319 (2017)
[23]	Fletcher, R., Practical methods of optimization (2013), John Wiley & Sons: John Wiley & Sons Hoboken, NJ, USA · Zbl 0905.65002
[24]	Frigola-Alcade, R.; Chen, Y.; Rasmussen, C., Variational Gaussian process state-space models, Advances in neural information processing systems 27, 3680-3688 (2014), MIT Press: MIT Press Cambridge, MA, USA
[25]	Frigola-Alcade, R.; Lindsten, F.; Schön, T. B.; Rasmussen, C. E., Bayesian inference and learning in Gaussian process state-space models with particle MCMC, Advances in Neural Information Processing Systems 26, 3156-3164 (2013), NIPS Foundation: NIPS Foundation Lake Tahoe, Nevada, USA
[26]	Frigola-Alcade, R.; Rasmussen, C. E., Integrated pre-processing for Bayesian nonlinear system identification with Gaussian processes, 52nd IEEE conference on decision and control, 5371-5376 (2013), IEEE: IEEE Firenze, Italy
[27]	Gal, Y.; Ghahramani, Z., A theoretically grounded application of dropout in recurrent neural networks, Advances in neural information processing systems, 1019-1027 (2016)
[28]	Gal, Y.; van der Wilk, M.; Rasmussen, C. E., Distributed variational inference in sparse Gaussian process regression and latent variable models, Advances in neural information processing systems 27, 3257-3265 (2014), NIPS Foundation: NIPS Foundation Montréal, Canada
[29]	Gelman, A., Vehtari, A., Jylänki, P., Robert, C., Chopin, N., & Cunningham, J. P. (2014). Expectation propagation as a way of life, arXiv preprint arXiv:1412.4869https://arxiv.org/pdf/1412.4869; Gelman, A., Vehtari, A., Jylänki, P., Robert, C., Chopin, N., & Cunningham, J. P. (2014). Expectation propagation as a way of life, arXiv preprint arXiv:1412.4869https://arxiv.org/pdf/1412.4869
[30]	Girard, A.; Rasmussen, C. E.; Quiñonero-Candela, J.; Murray-Smith, R., Gaussian process priors with uncertain inputs - application to multiple-step ahead time series forecasting, Advances in neural information processing systems 15 (2002), MIT Press: MIT Press Vancouver, Canada
[31]	Girard, A.; Rasmussen, C.; Quiñonero-Candela, J.; Murray-Smith, R., Multiple-step ahead prediction for non linear dynamic systems: a Gaussian process treatment with propagation of the uncertainty, (Advances in neural information processing systems 16 (2003), MIT Press: MIT Press Cambridge, MA, USA), 529-536
[32]	Glorot, X.; Bordes, A.; Bengio, Y., Deep sparse rectifier neural networks, Proceedings of the 14th international conference on artificial intelligence and statistics, Vol. 15, 315-323 (2011), JMLR.org: JMLR.org Ft. Lauderdale, Florida, USA
[33]	Green, P.; Cross, E.; Worden, K., Bayesian system identification of dynamical systems using highly informative training data, Mechanical Systems and Signal Processing, 56, 109-122 (2015)
[34]	Green, P.; Maskell, S., Estimating the parameters of dynamical systems from big data using sequential Monte Carlo samplers, Mechanical Systems and Signal Processing, 93, 379-396 (2017)
[35]	He, K.; Zhang, X.; Ren, S.; Sun, J., Delving deep into rectifiers: Surpassing human-level performance on imagenet classification, Proceedings of the IEEE international conference on computer vision, 1026-1034 (2015), IEEE: IEEE Santiago, Chile
[36]	Hensman, J.; Damianou, A.; Lawrence, N., Opening the way for deep Gaussian processes on massive data, (International conference on artificial intelligence and statistics (2014))
[37]	Hensman, J., Durrande, N., & Solin, A. (2016). Variational fourier features for Gaussian processes, arXiv preprint arXiv:1611.06740https://arxiv.org/abs/1611.06740; Hensman, J., Durrande, N., & Solin, A. (2016). Variational fourier features for Gaussian processes, arXiv preprint arXiv:1611.06740https://arxiv.org/abs/1611.06740
[38]	Hensman, J.; Fusi, N.; Lawrence, N. D., Gaussian processes for big data, 29th conference on uncertainty in artificial intelligence, 282-290 (2013), AUAI Press: AUAI Press Bellevue, Washington, USA
[39]	Hensman, J.; Matthews, A. G.d. G.; Ghahramani, Z., Scalable variational Gaussian process classification, Proceedings of the 18th international conference on artificial intelligence and statistics (2015), JMLR.org: JMLR.org San Diego, California, USA
[40]	Hjalmarsson, H.; Rojas, C. R.; Rivera, D. E., System identification: a Wiener-Hammerstein benchmark, Control Engineering Practice, 20, 11, 1095-1096 (2012)
[41]	Hoang, T. N.; Hoang, Q. M.; Low, B. K.H., A unifying framework of anytime sparse Gaussian process regression models with stochastic variational inference for big data, Proceedings of the 32nd international conference on machine learning, 569-578 (2015), JMLR.org: JMLR.org Lille, France
[42]	Hoang, T. N.; Hoang, Q. M.; Low, B. K.H., A distributed variational inference framework for unifying parallel sparse Gaussian process regression models, Proceedings of the 33rd international conference on machine learning, 382-391 (2016), JMLR.org: JMLR.org New York City, NY, USA
[43]	Hochreiter, S.; Schmidhuber, J., Long short-term memory, Neural Computation, 9, 8, 1735-1780 (1997)
[44]	Hoffman, M. D.; Blei, D. M.; Wang, C.; Paisley, J. W., Stochastic variational inference, Journal of Machine Learning Research (JMLR), 14, 1, 1303-1347 (2013) · Zbl 1317.68163
[45]	Isermann, R.; Münchhof, M., Identification of dynamic systems (2011), Springerverlag Berlin Heidelberg: Springerverlag Berlin Heidelberg Berlin, Alemanha
[46]	Jordan, M. I.; Ghahramani, Z.; Jaakkola, T. S.; Saul, L. K., An introduction to variational methods for graphical models, Machine Learning, 37, 2, 183-233 (1999) · Zbl 0945.68164
[47]	Kantas, N.; Doucet, A.; Singh, S. S.; Maciejowski, J.; Chopin, N., On particle methods for parameter estimation in state-space models, Statistical Science, 30, 3, 328-351 (2015) · Zbl 1332.62096
[48]	Kim, Y.; Mallick, R.; Bhowmick, S.; Chen, B.-L., Nonlinear system identification of large-scale smart pavement systems, Expert Systems with Applications, 40, 9, 3551-3560 (2013)
[49]	Kingma, D.; Ba, J., Adam: A method for stochastic optimization, International conference on learning representations (2015), ICLR: ICLR San Diego, California, USA
[50]	Kingma, D. P.; Welling, M., Auto-encoding variational Bayes (2014)
[51]	Kocijan, J., Modelling and control of dynamic systems using Gaussian process models (2016), Springer: Springer New York, NY, USA · Zbl 1339.93004
[52]	Kocijan, J.; Girard, A.; Banko, B.; Murray-Smith, R., Dynamic systems identification with Gaussian processes, Mathematical and Computer Modelling of Dynamical Systems, 11, 4, 411-424 (2005) · Zbl 1122.93081
[53]	Lawrence, N. D., Gaussian process latent variable models for visualisation of high dimensional data, Advances in neural information processing systems 17, 329-336 (2004), MIT Press: MIT Press Vancouver and Whistler, British Columbia, Canada
[54]	Li, X. M.; Ouyang, J. H., Tuning the learning rate for stochastic variational inference, Journal of Computer Science and Technology, 31, 2, 428 (2016)
[55]	Liu, H., Ong, Y. S., Shen, X., & Cai, J. (2018). When Gaussian process meets big data: A review of scalable GPs, arXiv preprint arXiv:1807.01065; Liu, H., Ong, Y. S., Shen, X., & Cai, J. (2018). When Gaussian process meets big data: A review of scalable GPs, arXiv preprint arXiv:1807.01065
[56]	Marconato, A.; Sjöberg, J.; Suykens, J.; Schoukens, J., Identification of the silverbox benchmark using nonlinear state-space models, Proceedings of the 16th IFAC symposium on system identification, 632-637 (2012), Elsevier: Elsevier Brussels, Belgium
[57]	Mattos, C. L.C.; Dai, Z.; Damianou, A.; Barreto, G. A.; Lawrence, N. D., Deep recurrent Gaussian processes for outlier-robust system identification, Journal of Process Control, 60, 82-94 (2017)
[58]	Mattos, C. L.C.; Dai, Z.; Damianou, A.; Forth, J.; Barreto, G. A.; Lawrence, N. D., Recurrent Gaussian processes, International conference on learning representations (2016), ICLR: ICLR San Juan Puerto Rico, URL https://arxiv.org/pdf/1511.06644
[59]	Mattos, C. L.C.; Santos, J. D.A.; Barreto, G. A., An empirical evaluation of robust Gaussian process models for system identification, (Intelligent data engineering and automated learning (2015), Springer: Springer Wroclaw, Poland), 172-180
[60]	Minka, T. P., Expectation propagation for approximate Bayesian inference, Proceedings of the 17th conference on uncertainty in artificial intelligence, 362-369 (2001), Morgan Kaufmann: Morgan Kaufmann Seattle, WA, USA
[61]	Murray-Smith, R.; Johansen, T. A.; Shorten, R., On transient dynamics, off-equilibrium behaviour and identification in blended multiple model structures, European control conference (1999), Springer: Springer Karlsruhe, Germany
[62]	Neal, R. M., Monte Carlo implementation of Gaussian process models for Bayesian regression and classification, Tech. rep. (1997), University of Toronto, Dept. of Statistics: University of Toronto, Dept. of Statistics Toronto, Canada
[63]	Nelles, O., Nonlinear system identification: from classical approaches to neural networks and fuzzy models (2013), Springer Science & Business Media: Springer Science & Business Media Berlin, Germany
[64]	Nguyen, T.; Bonilla, E., Fast allocation of Gaussian process experts, Proceedings of the 31st international conference on machine learning, 145-153 (2014), JMLR.org: JMLR.org Beijing, China
[65]	Nickson, T., Gunter, T., Lloyd, C., Osborne, M. A., & Roberts, S. (2015). Blitzkriging: Kronecker-structured Stochastic Gaussian processes, arXiv preprint arXiv:1510.07965https://arxiv.org/pdf/1510.07965; Nickson, T., Gunter, T., Lloyd, C., Osborne, M. A., & Roberts, S. (2015). Blitzkriging: Kronecker-structured Stochastic Gaussian processes, arXiv preprint arXiv:1510.07965https://arxiv.org/pdf/1510.07965
[66]	Quiñonero-Candela, J.; Rasmussen, C. E., A unifying view of sparse approximate Gaussian process regression, The Journal of Machine Learning Research (JMLR), 6, 1939-1959 (2005) · Zbl 1222.68282
[67]	R: A language and environment for statistical computing (2017), R Foundation for Statistical Computing: R Foundation for Statistical Computing Vienna, Austria, URL https://www.R-project.org/
[68]	Raissi, M. (2017). Parametric Gaussian process regression for big data, arXiv preprint arXiv:1704.03144https://arxiv.org/pdf/1704.03144; Raissi, M. (2017). Parametric Gaussian process regression for big data, arXiv preprint arXiv:1704.03144https://arxiv.org/pdf/1704.03144
[69]	Ranganath, R.; Wang, C.; David, B.; Xing, E., An adaptive learning rate for stochastic variational inference, Proceedings of the 30th international conference on machine learning, 298-306 (2013), JMLR.org: JMLR.org Atlanta, USA
[70]	Rasmussen, C.; Williams, C., Gaussian processes for machine learning (2006), MIT Press: MIT Press Cambridge, MA, USA · Zbl 1177.68165
[71]	Reece, S.; Roberts, S., An introduction to Gaussian processes for the Kalman filter expert, 13th conference on information fusion, 1-9 (2010), IEEE: IEEE Edinburgh, UK
[72]	Rezende, D. J.; Mohamed, S.; Wierstra, D., Stochastic backpropagation and approximate inference in deep generative models, Proceedings of the 31st international conference on machine learning (2014), JMLR.org: JMLR.org Beijing, China
[73]	Robbins, H.; Monro, S., A stochastic approximation method, The Annals of Mathematical Statistics, 400-407 (1951) · Zbl 0054.05901
[74]	Rottmann, A.; Burgard, W., Learning non-stationary system dynamics online using Gaussian processes, (Joint pattern recognition symposium. Joint pattern recognition symposium, Lecture notes in computer science, Vol. 6373 (2010), Springer), 192-201
[75]	Rumelhart, D. E.; Hinton, G. E.; Williams, R. J., Learning internal representations by error propagation, Tech. rep., 318-362 (1985), MIT Press, DTIC Document: MIT Press, DTIC Document Cambridge, Massachusetts, USA
[76]	Salimbeni, H., & Deisenroth, M. (2017). Doubly stochastic variational inference for deep Gaussian processes, arXiv preprint arXiv:1705.08933https://arxiv.org/abs/1705.08933; Salimbeni, H., & Deisenroth, M. (2017). Doubly stochastic variational inference for deep Gaussian processes, arXiv preprint arXiv:1705.08933https://arxiv.org/abs/1705.08933
[77]	Santos, J. D.A.; Barreto, G. A., A regularized estimation framework for online sparse LSSVR models, Neurocomputing, 238, 114-125 (2017)
[78]	Särkkä; Solin, A.; Hartikainen, J., Spatiotemporal learning via infinite-dimensional Bayesian filtering and smoothing: A look at Gaussian process regression through Kalman filtering, IEEE Signal Processing Magazine, 30, 4, 51-61 (2013)
[79]	Schön, T. B.; Lindsten, F.; Dahlin, J.; Wågberg, J.; Naesseth, C. A.; Svensson, A., Sequential Monte Carlo methods for system identification, Proceedings of the 17th IFAC symposium on system identification, 775-786 (2015), Elsevier: Elsevier Beijing, China
[80]	Schoukens, J.; Nemeth, J. G.; Crama, P.; Rolain, Y.; Pintelon, R., Fast approximate identification of nonlinear systems, Automatica, 39, 7, 1267-1274 (2003) · Zbl 1032.93011
[81]	Schoukens, J.; Suykens, J.; Ljung, L., Wiener-Hammerstein benchmark, Proceedings of the 15th IFAC symposium on system identification (2009), Elsevier: Elsevier Saint-Malo, France
[82]	Snoek, J.; Larochelle, H.; Adams, R. P., Practical Bayesian optimization of machine learning algorithms, Advances in neural information processing systems 25, 2951-2959 (2012), NIPS Foundation: NIPS Foundation Lake Tahoe, Nevada, USA
[83]	Solak, E.; Murray-Smith, R.; Leithead, W. E.; Leith, D. J.; Rasmussen, C. E., Derivative observations in Gaussian process models of dynamic systems, (Advances in neural information processing systems 16 (2003)), 16
[84]	Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R., Dropout: a simple way to prevent neural networks from overfitting, Journal of Machine Learning Research (JMLR), 15, 1, 1929-1958 (2014) · Zbl 1318.68153
[85]	Svensson, A.; Solin, A.; Särkkä, S.; Schön, T. B., Computationally efficient Bayesian learning of Gaussian process state space models, Proceedings of the 19th international conference on artificial intelligence and statistics, 213-221 (2016), JMLR.org: JMLR.org Cadiz, Spain
[86]	Talbi, E.-G., Metaheuristics: from design to implementation, Vol. 74 (2009), John Wiley & Sons: John Wiley & Sons Hoboken, NJ, USA · Zbl 1190.90293
[87]	The MathWorks Inc., Nonlinear Modeling of a Magneto-Rheological Fluid Damper, URL http://www.mathworks.com/help/ident/examples/nonlinear-modeling-of-a-magneto-rheological-fluid-damper.html; The MathWorks Inc., Nonlinear Modeling of a Magneto-Rheological Fluid Damper, URL http://www.mathworks.com/help/ident/examples/nonlinear-modeling-of-a-magneto-rheological-fluid-damper.html
[88]	Titsias, M. K., Variational model selection for sparse Gaussian process regression, Tech. rep., Manchester, UK, Technical report (2009), School of Computer Science, University of Manchester
[89]	Titsias, M. K., Variational learning of inducing variables in sparse Gaussian processes, Proceedings of the 12th international conference on artificial intelligence and statistics, 567-574 (2009), JMLR.org: JMLR.org Clearwater Beach, FL, USA
[90]	Titsias, M. K.; Lawrence, N. D., Bayesian Gaussian process latent variable model, Proceedings of the 13th international conference on artificial intelligence and statistics, 844-851 (2010), JMLR.org: JMLR.org Sardinia, Italy
[91]	Turner, R. D.; Deisenroth, M. P.; Rasmussen, C. E., State-space inference and learning with Gaussian processes, Proceedings of the 13th international conference on artificial intelligence and statistics, 868-875 (2010), JMLR.org: JMLR.org Sardinia, Italy
[92]	Vuković, N.; Miljković, Z., Robust sequential learning of feedforward neural networks in the presence of heavy-tailed noise, Neural Networks, 63, 31-47 (2015) · Zbl 1325.68202
[93]	Wainwright, M. J.; Jordan, M. I., Graphical models, exponential families, and variational inference, Foundations and Trends in Machine Learning, 1, 1-2, 1-305 (2008) · Zbl 1193.62107
[94]	Wang, J.; Sano, A.; Chen, T.; Huang, B., Identification of Hammerstein systems without explicit parameterisation of non-linearity, International Journal of Control, 82, 5, 937-952 (2009) · Zbl 1165.93306
[95]	Wigren, T.; Schoukens, J., Data for benchmarking in nonlinear system identification, Tech. rep., Uppsala, Sweden, Technical report (2013), Uppsala University, Dept. of Information Technology
[96]	Williams, R. J.; Zipser, D., Gradient-based learning algorithms for recurrent networks and their computational complexity, Back-Propagation: Theory, Architectures and Applications, 433-486 (1995)
[97]	Wilson, A. G., Dann, C., & Nickisch, H. (2015). Thoughts on massively scalable Gaussian processes, arXiv preprint arXiv:1511.01870https://arxiv.org/pdf/1511.01870; Wilson, A. G., Dann, C., & Nickisch, H. (2015). Thoughts on massively scalable Gaussian processes, arXiv preprint arXiv:1511.01870https://arxiv.org/pdf/1511.01870
[98]	Wilson, A. G.; Hu, Z.; Salakhutdinov, R.; Xing, E. P., Deep kernel learning, Proceedings of the 19th international conference on artificial intelligence and statistics, 370-378 (2016), JMLR.org: JMLR.org Cadiz, Spain
[99]	Wilson, A. G.; Hu, Z.; Salakhutdinov, R. R.; Xing, E. P., Stochastic variational deep kernel learning, Advances in neural information processing systems 29, 2586-2594 (2016), NIPS Foundation: NIPS Foundation Barcelona, Spain
[100]	Wilson, A.; Nickisch, H., Kernel interpolation for scalable structured Gaussian processes (KISS-GP), Proceedings of the 32nd international conference on machine learning, 1775-1784 (2015), JMLR.org: JMLR.org Lille, France

This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.