×

A scaled gradient projection method for Bayesian learning in dynamical systems. (English) Zbl 1461.65133

Summary: A crucial task in system identification problems is the selection of the most appropriate model class and is classically addressed resorting to cross-validation or using order selection criteria based on asymptotic arguments. As recently suggested in the literature, this can be addressed in a Bayesian framework, where model complexity is regulated by a few hyperparameters, which can be estimated via marginal likelihood maximization. It is thus of primary importance to design effective optimization methods to solve the corresponding optimization problem. If the unknown impulse response is modeled as a Gaussian process with a suitable kernel, the maximization of the marginal likelihood leads to a challenging nonconvex optimization problem, which requires a stable and effective solution strategy. In this paper we address this problem by means of a scaled gradient projection algorithm, in which the scaling matrix and the steplength parameter play a crucial role to provide a meaningful solution in a computational time comparable with second order methods. In particular, we propose both a generalization of the split gradient approach to design the scaling matrix in the presence of box constraints and an effective implementation of the gradient and objective function. The extensive numerical experiments carried out on several test problems show that our method is very effective in providing in a few tenths of a second solutions of the problems with accuracy comparable with state-of-the-art approaches. Moreover, the flexibility of the proposed strategy makes it easily adaptable to a wider range of problems arising in different areas of machine learning, signal processing, and system identification.

MSC:

65K05 Numerical mathematical programming methods
90C30 Nonlinear programming
90C90 Applications of mathematical programming
93B30 System identification

Software:

CVXOPT

References:

[1] M. Andersen, J. Dahl, and L. Vandenberghe, {\it Cvxopt: Python Software for Convex Optimization, Version} 1.1.6, http://cvxopt.orghttp://cvxopt.org, 2013.
[2] A. Aravkin, J. Burke, A. Chiuso, and G. Pillonetto, {\it Convex vs non-convex estimators for regression and sparse estimation: The mean squared error properties of ARD and GLasso}, J. Mach. Learn. Res., 15 (2014), pp. 217-252. · Zbl 1318.62238
[3] M. Ayazoglu and M. Sznaier, {\it An algorithm for fast constrained nuclear norm minimization and applications to systems identification}, in Proceedings of IEEE CDC, 2012, pp. 3469-3475.
[4] F. R. Bach, G. R. G. Lanckriet, and M. I. Jordan, {\it Multiple kernel learning, conic duality, and the SMO algorithm}, in Proceedings of the 21st International Conference on Machine Learning, New York, 2004, ACM, pp. 41-48.
[5] M. Bertero, {\it Linear inverse and ill-posed problems}, Adv. Electron. El. Phys., 75 (1989), pp. 1-120.
[6] M. Bertero, H. Lantéri, and L. Zanni, {\it Iterative image reconstruction: A point of view}, in Mathematical Methods in Biomedical Imaging and Intensity-Modulated Radiation Therapy (IMRT), Y. Censor, M. Jiang, and A. K. Louis, eds., Birkhäuser-Verlag, Pisa, Italy, 2008, pp. 37-63.
[7] D. P. Bertsekas, {\it Nonlinear Programming}, 2nd ed., Athena Scientific, Belmont, MA, 1999. · Zbl 1015.90077
[8] S. Bonettini, {\it Inexact block coordinate descent methods with application to non-negative matrix factorization}, IMA J. Numer. Anal., 31 (2011), pp. 1431-1452. · Zbl 1235.65061
[9] S. Bonettini, A. Cornelio, and M. Prato, {\it A new semiblind deconvolution approach for Fourier-based image restoration: An application in astronomy}, SIAM J. Imaging Sci., 6 (2013), pp. 1736-1757. · Zbl 1282.94010
[10] S. Bonettini, G. Landi, E. Loli Piccolomini, and L. Zanni, {\it Scaling techniques for gradient projection-type methods in astronomical image deblurring}, Int. J. Comput. Math., 90 (2013), pp. 9-29. · Zbl 1278.68326
[11] S. Bonettini and M. Prato, {\it Nonnegative image reconstruction from sparse Fourier data: A new deconvolution algorithm}, Inverse Problems, 26 (2010), 095001. · Zbl 1201.94010
[12] S. Bonettini, R. Zanella, and L. Zanni, {\it A scaled gradient projection method for constrained image deblurring}, Inverse Problems, 25 (2009), 015002. · Zbl 1155.94011
[13] G. Box, G. M. Jenkins, and G. Reinsel, {\it Time Series Analysis: Forecasting & Control}, 3rd ed., Prentice-Hall, Englewood Cliffs, NJ, 1994. · Zbl 0858.62072
[14] S. Boyd and L. Vandeberghe, {\it Convex Optimization}, Cambridge University Press, Cambridge, 2004. · Zbl 1058.90049
[15] T. Chen, M. S. Andersen, L. Ljung, A. Chiuso, and G. Pillonetto, {\it System identification via sparse multiple kernel-based regularization using sequential convex optimization techniques}, IEEE Trans. Automat. Control, 59 (2014), pp. 2933-2945. · Zbl 1360.93720
[16] T. Chen and L. Ljung, {\it Implementation of algorithms for tuning parameters in regularized least squares problems in system identification}, Automatica, 49 (2013), pp. 2213-2220. · Zbl 1364.93825
[17] T. Chen, H. Ohlsson, and L. Ljung, {\it On the estimation of transfer functions, regularizations and Gaussian processes–revisited}, Automatica, 48 (2012), pp. 1525-1535. · Zbl 1269.93126
[18] A. Chiuso, T. Chen, L. Ljung, and G. Pillonetto, {\it On the design of multiple kernels for nonparametric linear system identification}, in Proceedings of IEEE CDC, 2014, pp. 3346-3351.
[19] A. Chiuso and G. Pillonetto, {\it A Bayesian approach to sparse dynamic network identification}, Automatica, 48 (2012), pp. 1553-1565. · Zbl 1267.93172
[20] M. E. Daube-Witherspoon and G. Muehllehner, {\it Iterative image space reconstruction algorithm suitable for volume ECT}, IEEE Trans. Med. Imaging, 5 (1986), pp. 61-66.
[21] F. Dinuzzo, {\it Kernels for Linear Time Invariant System Identification}, CoRR abs/1203.4930, 2012. · Zbl 1329.93049
[22] T. Doan, R. Litterman, and C. A. Sims, {\it Forecasting and conditional projection using realistic prior distributions}, Econometric Rev., 3 (1984), pp. 1-100. · Zbl 0613.62142
[23] E. D. Dolan and J. J. Moré, {\it Benchmarking optimization software with performance profiles}, Math. Progam. Ser. A, 91 (2002), pp. 201-213. · Zbl 1049.90004
[24] D. Donoho, {\it Compressed sensing}, IEEE Trans. Inform. Theory, 52 (2006), pp. 1289-1306. · Zbl 1288.94016
[25] M. Fazel, H. Hindi, and S. P. Boyd, {\it A rank minimization heuristic with application to minimum order system approximation}, in Proceedings of the American Control Conference, Vol. 6, 2001, pp. 4734-4739.
[26] G. Frassoldati, G. Zanghirati, and L. Zanni, {\it New adaptive stepsize selections in gradient methods}, J. Ind. Manage. Optim., 4 (2008), pp. 299-312. · Zbl 1161.90524
[27] G. C. Goodwin, M. Gevers, and B. Ninness, {\it Quantifying the error in estimated transfer functions with application to model order selection}, IEEE Trans. Automat. Control, 37 (1992), pp. 913-928. · Zbl 0767.93022
[28] W. W. Hager, B. A. Mair, and H. Zhang, {\it An affine-scaling interior-point CBB method for box-constrained optimization}, Math. Program. Ser. A, 119 (2009), pp. 1-32. · Zbl 1168.90007
[29] T. Kailath, {\it Linear Systems}, Prentice-Hall, Englewood Cliffs, NJ, 1980. · Zbl 0454.93001
[30] G. Kitagawa and H. Gersh, {\it A smoothness priors-state space modeling of time series with trend and seasonality}, J. Amer. Statist. Assoc., 79 (1984), pp. 378-389.
[31] H. Lanteri, M. Roche, and C. Aime, {\it Penalized maximum likelihood image restoration with positivity constraints: Multiplicative algorithms}, Inverse Problems, 28 (2002), pp. 1397-1419. · Zbl 1023.62099
[32] J. B. Lasserre, {\it A trace inequality for matrix product}, IEEE Trans. Automat. Control, 40 (1995), pp. 1500-1501. · Zbl 0835.15013
[33] D. D. Lee and H. S. Seung, {\it Learning the parts of objects by non-negative matrix factorization}, Nature, 401 (1999), pp. 788-791. · Zbl 1369.68285
[34] L. Ljung, {\it System Identification–Theory for the User}, 2nd ed., Prentice-Hall, Englewood Cliffs, NJ, 1999. · Zbl 0615.93004
[35] L. Ljung, H. Hjalmarsson, and H. Ohlsson, {\it Four encounters with system identification}, Eur. J. Control, 17 (2011), pp. 449-471. · Zbl 1259.93044
[36] D. J. C. MacKay, {\it Bayesian non-linear modelling for the prediction competition}, in ASHRAE Transactions 100, Pt. 2, ASHRAE, Atlanta, 1994, pp. 1053-1062.
[37] J. S. Maritz and T. Lwin, {\it Empirical Bayes Method}, Chapman and Hall, London, UK, 1989. · Zbl 0731.62040
[38] M. Merritt and Y. Zhang, {\it Interior-point gradient method for large-scale totally nonnegative least squares problems}, J. Optim. Theory Appl., 126 (2005), pp. 191-202. · Zbl 1093.90084
[39] G. Pillonetto and A. Chiuso, {\it Tuning complexity in regularized kernel-based regression and linear system identification: The robustness of the marginal likelihood estimator}, Automatica, to appear. · Zbl 1330.93235
[40] G. Pillonetto, A. Chiuso, and G. De Nicolao, {\it Prediction error identification of linear systems: A nonparametric Gaussian regression approach}, Automatica, 47 (2011), pp. 291-305. · Zbl 1207.93110
[41] G. Pillonetto and G. De Nicolao, {\it A new kernel-based approach for linear system identification}, Automatica, 46 (2010), pp. 81-93. · Zbl 1214.93116
[42] G. Pillonetto, F. Dinuzzo, T. Chen, G. De Nicolao, and L. Ljung, {\it Kernel methods in system identification, machine learning and function estimation: A survey}, Automatica, 50 (2014), pp. 657-682. · Zbl 1298.93342
[43] M. Prato, R. Cavicchioli, L. Zanni, P. Boccacci, and M. Bertero, {\it Efficient deconvolution methods for astronomical imaging: Algorithms and IDL-GPU codes}, Astronom. Astrophys., 539 (2012), A133.
[44] C. E. Rasmussen and C. K. I. Williams, {\it Gaussian Processes for Machine Learning}, MIT Press, Cambridge, MA, 2006. · Zbl 1177.68165
[45] L. A. Shepp and Y. Vardi, {\it Maximum likelihood reconstruction for emission tomography}, IEEE Trans. Med. Imaging, 1 (1982), pp. 113-122.
[46] T. Soderstrom and P. Stoica, {\it System Identification}, Prentice-Hall, Englewood Cliffs, NJ, 1989. · Zbl 0695.93108
[47] B. K. Sriperumbudur and G. R. G. Lanckriet, {\it On the convergence of the concave-convex procedure}, Adv. Neural Inf. Process. Syst., 22 (2009), pp. 1391-1407. · Zbl 1254.90180
[48] R. Tibshirani, {\it Regression shrinkage and selection via the LASSO}, J. Roy. Statist. Soc. B, 58 (1996), pp. 267-288. · Zbl 0850.62538
[49] M. Tipping, {\it Sparse Bayesian learning and the relevance vector machine}, J. Mach. Learn. Res., 1 (2001), pp. 211-244. · Zbl 0997.68109
[50] D. P. Wipf, B. D. Rao, and S. Nagarajan, {\it Latent variable Bayesian models for promoting sparsity}, IEEE Trans. Inform. Theory, 57 (2011), pp. 6236-6255. · Zbl 1365.62103
[51] R. Zanella, P. Boccacci, L. Zanni, and M. Bertero, {\it Efficient gradient projection methods for edge-preserving removal of Poisson noise}, Inverse Problems, 25 (2009), 045010. · Zbl 1163.65042
[52] R. Zanella, G. Zanghirati, R. Cavicchioli, L. Zanni, P. Boccacci, M. Bertero, and G. Vicidomini, {\it Towards real-time image deconvolution: Application to confocal and STED microscopy}, Sci. Rep., 3 (2013), 2523.
[53] Y. Zhu, {\it Multivariable System Identification for Process Control}, Elsevier Science, New York, 2001.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.