×

Generalized Kalman smoothing: modeling and algorithms. (English) Zbl 1375.93144

Summary: State-space smoothing has found many applications in science and engineering. Under linear and Gaussian assumptions, smoothed estimates can be obtained using efficient recursions, for example Rauch-Tung-Striebel and Mayne-Fraser algorithms. Such schemes are equivalent to linear algebraic techniques that minimize a convex quadratic objective function with structure induced by the dynamic model.
These classical formulations fall short in many important circumstances. For instance, smoothers obtained using quadratic penalties can fail when outliers are present in the data, and cannot track impulsive inputs and abrupt state changes. Motivated by these shortcomings, generalized Kalman smoothing formulations have been proposed in the last few years, replacing quadratic models with more suitable, often nonsmooth, convex functions. In contrast to classical models, these general estimators require use of iterated algorithms, and these have received increased attention from control, signal processing, machine learning, and optimization communities.
In this survey, we show that the optimization viewpoint provides the control and signal processing community great freedom in the development of novel modeling and inference frameworks for dynamical systems. We discuss general statistical models for dynamic systems, making full use of nonsmooth convex penalties and constraints, and providing links to important models in signal processing and machine learning. We also survey optimization techniques for these formulations, paying close attention to dynamic problem structure. Modeling concepts and algorithms are illustrated with numerical examples.

MSC:

93E25 Computational methods in stochastic control (MSC2010)
93E10 Estimation and detection in stochastic control theory

References:

[3] Ansley, C. F.; Kohn, R., A geometrical derivation of the fixed interval smoothing algorithm, Biometrika, 69, 486-487 (1982) · Zbl 0494.93052
[5] Aravkin, A.; Bell, B. M.; Burke, J. V.; Pillonetto, G., An \(\ell_1\)-Laplace robust Kalman smoother, IEEE Transactions on Automatic Control, 56, 12, 2898-2911 (2011) · Zbl 1368.93755
[6] Aravkin, A.; Burke, J.; Chiuso, A.; Pillonetto, G., Convex vs. nonconvex approaches for sparse estimation: GLASSO, multiple kernel learning, and HGLASSO, Journal of Machine Learning Research (JMLR), 15, 217-252 (2014) · Zbl 1318.62238
[8] Aravkin, A.; Burke, J. V.; Pillonetto, G., Sparse/robust estimation and Kalman smoothing with nonsmooth log-concave densities: Modeling, computation, and theory, Journal of Machine Learning Research (JMLR), 14, 2689-2728 (2013) · Zbl 1318.62237
[9] Aravkin, A.; Burke, J. V.; Pillonetto, G., Robust and trend-following Student’s t Kalman smoothers, SIAM Journal on Control and Optimization, 52, 5, 2891-2916 (2014) · Zbl 1338.60113
[10] Aravkin, A.; Friedlander, M.; Herrmann, F.; Van Leeuwen, T., Robust inversion, dimensionality reduction, and randomized sampling, Mathematical Programming, 134, 1, 101-125 (2012) · Zbl 1254.90112
[11] Aravkin, A.; Lozano, A.; Luss, R.; Kambadur, P., Orthogonal matching pursuit for sparse quantile regression, (2014 IEEE international conference on data mining (2014), IEEE), 11-19
[12] Arulampalam, M. S.; Maskell, S.; Gordon, N.; Clapp, T., A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking, IEEE Transactions on Signal Processing, 50, 2, 174-188 (2002)
[13] Badawi, F.; Lindquist, A.; Pavon, M., A stochastic realization approach to the smoothing problem, IEEE Transactions on Automatic Control, 24, 878-888 (1975) · Zbl 0426.93056
[15] Bauschke, H. H.; Combettes, P. L., Convex analysis and monotone operator theory in Hilbert spaces (2011), Springer Science & Business Media · Zbl 1218.47001
[16] Beck, A.; Teboulle, M., Fast gradient-based algorithms for constrained total variation image denoising and deblurring problems, IEEE Transactions on Image Processing, 18, 11, 2419-2434 (2009) · Zbl 1371.94049
[17] Bell, B., The iterated Kalman smoother as a Gauss-Newton method, SIAM Journal on Optimization, 4, 3, 626-636 (1994) · Zbl 0814.93079
[18] Bell, B.; Cathey, F., The iterated Kalman filter update as a Gauss-Newton method, IEEE Transactions on Automatic Control, 38, 2, 294-297 (1993) · Zbl 0775.93237
[19] Bell, B. M., The marginal likelihood for parameters in a discrete Gauss-Markov process, IEEE Transactions on Signal Processing, 48, 3, 626-636 (2000)
[20] Bell, B. M.; Burke, J. V.; Pillonetto, G., An inequality constrained nonlinear Kalman-Bucy smoother by interior point likelihood maximization, Automatica, 45, 1, 25-33 (2009) · Zbl 1154.93434
[21] Bertero, M., Linear inverse and ill-posed problems, Advances in Electronics and Electron Physics, 75, 1-120 (1989)
[22] Bertsekas, D., Nonlinear programming (1999), Athena Scientific · Zbl 1015.90077
[23] (Bottou, L.; Chapelle, O.; DeCoste, D.; Weston, J., Large scale kernel machines (2007), MIT Press: MIT Press Cambridge, MA, USA)
[24] Boyd, S.; Parikh, N.; Chu, E.; Peleato, B.; Eckstein, J., Distributed optimization and statistical learning via the alternating direction method of multipliers, Foundations and \(Trends^®\) in Machine Learning, 3, 1, 1-122 (2011) · Zbl 1229.90122
[25] Boyd, S.; Vandenberghe, L., Convex optimization (2004), Cambridge University Press · Zbl 1058.90049
[26] Burke, J., Descent methods for composite nondifferentiable optimization problems, Mathematical Programming, 33, 260-279 (1985) · Zbl 0581.90084
[27] Burke, J. V.; Ferris, M. C., A Gauss-Newton method for convex composite optimization, Mathematical Programming, 71, 2, 179-194 (1995) · Zbl 0846.90083
[28] Candès, E.; Tao, T., Near-optimal signal recovery from random projections: Universal encoding strategies, IEEE Transactions on Information Theory, 52, 12, 5406-5425 (2006) · Zbl 1309.94033
[29] Chambolle, A.; Pock, T., A first-order primal-dual algorithm for convex problems with applications to imaging, Journal of Mathematical Imaging and Vision, 40, 1, 120-145 (2011) · Zbl 1255.68217
[31] Chang, L.; Hu, B.; Chang, G.; Li, A., Robust derivative-free Kalman filter based on Huber’s M-estimation methodology, Journal of Process Control, 23, 10, 1555-1561 (2013)
[32] Chu, W.; Keerthi, S. S.; Ong, C. J., A unified loss function in Bayesian framework for support vector regression, Epsilon, 1, 1.5, 2 (2001)
[33] Combettes, P. L.; Pesquet, J.-C., Proximal splitting methods in signal processing, (Fixed-point algorithms for inverse problems in science and engineering (2011), Springer), 185-212 · Zbl 1242.90160
[36] De Mol, C.; De Vito, E.; Rosasco, L., Elastic-net regularization in learning theory, Journal of Complexity, 25, 2, 201-230 (2009) · Zbl 1319.62087
[37] Dekel, O.; Shalev-Shwartz, S.; Singer, Y., Smooth \(\varepsilon \)-insensitive regression by loss symmetrization, (Journal of Machine Learning Research (2005)), 711-741 · Zbl 1222.68183
[38] Dinuzzo, F., Analysis of fixed-point and coordinate descent algorithms for regularized Kernel methods, IEEE Transactions on Neural Networks, 22, 10, 1576-1587 (2011)
[39] Donoho, D., Compressed sensing, IEEE Transaction on Information Theory, 52, 4, 1289-1306 (2006) · Zbl 1288.94016
[41] Efron, B.; Hastie, T.; Johnstone, L.; Tibshirani, R., Least angle regression, The Annals of Statistics, 32, 407-499 (2004) · Zbl 1091.62054
[42] Farahmand, S.; Giannakis, G. B.; Angelosante, D., Doubly robust smoothing of dynamical processes via outlier sparsity constraints, IEEE Transactions on Signal Processing, 59, 4529-4543 (2011) · Zbl 1391.93281
[44] Fraser, D. C.; Potter, J. E., The optimum linear smoother as a combination of two optimum linear filters, IEEE Transactions on Automatic Control, 387-390 (1969)
[45] Friedman, J.; Hastie, T.; Tibshirani, R., Regularization paths for generalized linear models via coordinate descent, J. Stat. Softw, 33, 1, 1-22 (2010)
[46] Gao, J., Robust l1 principal component analysis and its Bayesian variational inference, Neural Computation, 20, 2, 555-572 (2008) · Zbl 1132.62048
[47] Golub, G.; Heath, M.; Wahba, G., Generalized cross-validation as a method for choosing a good ridge parameter, Technometrics, 21, 2, 215-223 (1979) · Zbl 0461.62059
[48] Gunter, L.; Zhu, J., Efficient computation and model selection for the support vector regression, Neural Computation, 19, 1633-1655 (2007) · Zbl 1119.68150
[49] Hampel, F. R.; Ronchetti, E. M.; Rousseeuw, P. J.; Stahel, W. A., Robust statistics (1986), John Wiley and Sons · Zbl 0593.62027
[50] Hastie, T.; Tibshirani, R.; Friedman, J., The elements of statistical learning. data mining, inference and prediction (2001), Springer: Springer Canada · Zbl 0973.62007
[51] Haykin, S., Kalman filtering and neural networks (2001), Wiley Online Library
[52] Hewer, G. A.; Martin, R. D.; Zeh, J., Robust preprocessing forKalman filtering of glint noise, IEEE Transactions on Aerospace and Electronic Systems, AES-23, 1, 120-128 (1987)
[53] Ho, C.; Lin, C., Large-scale linear support vector regression, Journal of Machine Learning Research, 13, 3323-3348 (2012) · Zbl 1433.68349
[54] Hofmann, T.; Schölkopf, B.; Smola, A. J., Kernel methods in machine learning, The Annals of Statistics, 36, 3, 1171-1220 (2008) · Zbl 1151.30007
[55] Huber, P.; Ronchetti, E., Robust statistics (2009), John Wiley and Sons: John Wiley and Sons New York, NY, USA · Zbl 1276.62022
[57] Jazwinski, A., Stochastic processes and filtering theory (1970), Dover Publications, Inc · Zbl 0203.50101
[58] Kailath, T.; Sayed, A.; Hassibi, B., Linear estimation (2000), Prentice Hall
[59] Kalman, R. E., A new approach to linear filtering and prediction problems, Transactions of the AMSE - Journal of Basic Engineering, 82, D, 35-45 (1960)
[60] Kalman, R. E.; Bucy, R. S., New results in linear filtering and prediction theory, Transactions of the AMSE Journal of Basic Engineering, 83, 95-108 (1961)
[61] Kim, S.; Koh, K.; Boyd, S.; Gorinevsky, D., \( \ell_1\) trend filtering, SIAM Reviews, 51, 2, 339-360 (2009) · Zbl 1171.37033
[62] Kitagawa, G.; Gersch, W., A smoothness priors time-varying AR coefficient modeling of nonstationary covariance time series, IEEE Transactions on Automatic Control, 30, 1, 48-56 (1985) · Zbl 0554.62079
[63] Koenker, R.; Bassett Jr., G., Regression quantiles, Journal of the Econometric Society, 33-50 (1978) · Zbl 0373.62038
[64] Kojima, M.; Megiddo, N.; Noma, T.; Yoshise, A., (A unified approach to interior point algorithms for linear complementarity problems. A unified approach to interior point algorithms for linear complementarity problems, Lecture notes in computer science, vol. 538 (1991), Springer Verlag: Springer Verlag Berlin, Germany) · Zbl 0745.90069
[65] Kyung, M.; Gill, J.; Ghosh, M.; Casella, G., Penalized regression, standard errors, and Bayesian lassos, Bayesian Analysis, 5, 2, 369-411 (2010) · Zbl 1330.62289
[66] Lee, Y.-J.; Hsieh, W.-F.; Huang, C.-M., \( \varepsilon \)-SSVR: a smooth support vector machine for \(\varepsilon \);-insensitive regression, IEEE Trans. Knowl. Data Eng., 17, 5, 678-685 (2005)
[67] Li, Q.; Lin, N., The Bayesian elastic net, Bayesian Analysis, 5, 1, 151-170 (2010) · Zbl 1330.65026
[68] Lindquist, A.; Picci, G., Linear stochastic systems: A geometric approach to modeling, estimation and identification (2015), Springer Berlin Heidelberg · Zbl 1319.93001
[69] Lions, P.-L.; Mercier, B., Splitting algorithms for the sum of two nonlinear operators, SIAM Journal on Numerical Analysis, 16, 6, 964-979 (1979) · Zbl 0426.65050
[70] Ljung, L., System identification - theory for the user (1999), Prentice-Hall: Prentice-Hall Upper Saddle River, N.J.
[71] Ljung, L.; Kailath, T., A unified approach to smoothing formulas, Automatica, 12, 2, 147-157 (1976) · Zbl 0323.93043
[72] Loh, P.-L.; Wainwright, M. J., Regularized M-estimators with nonconvexity: Statistical and algorithmic theory for local optima, (Advances in Neural Information Processing Systems (2013)), 476-484
[74] Mackay, D., Bayesian non-linear modelling for the prediction competition., ASHRAE Transactions, 100, 2, 3704-3716 (1994) · Zbl 1391.62010
[75] Maritz, J. S.; Lwin, T., Empirical Bayes method (1989), Chapman and Hall · Zbl 0731.62040
[76] Mayne, D. Q., A solution of the smoothing problem for linear dynamic systems, Automatica, 4, 73-92 (1966) · Zbl 0146.37801
[77] Meinshausen, N.; Bühlmann, P., High-dimensional graphs and variable selection with the Lasso, The Annals of Statistics, 34, 1436-1462 (2006) · Zbl 1113.62082
[78] Meinshausen, N.; Yu, B., Lasso-type recovery of sparse representations for high-dimensional data, The Annals of Statistics, 37, 1, 246-270 (2009) · Zbl 1155.62050
[79] Negahban, S.; Wainwright, M. J., Simultaneous support recovery in high-dimensional regression: Benefits and perils of \(\ell_{1, \infty}\)-regularization (2009), Department of Statistics: Department of Statistics UC Berkeley
[80] Nemirovskii, A.; Nesterov, Y., (Interior-point polynomial algorithms in convex programming. Interior-point polynomial algorithms in convex programming, Studies in applied mathematics, vol. 13 (1994), SIAM: SIAM Philadelphia, PA, USA) · Zbl 0824.90112
[81] Nesterov, Y., (Introductory lectures on convex optimization. Introductory lectures on convex optimization, Applied optimization, vol. 87 (2004), Kluwer Academic Publishers: Kluwer Academic Publishers Boston, MA), xviii+236, A basic course · Zbl 1086.90045
[82] Niedzwiecki, M.; Gackowski, S., New approach to noncausal identification of nonstationary stochastic FIR systems subject to both smooth and abrupt parameter changes, IEEE Transactions on Automatic Control, 58, 7, 1847-1853 (2013) · Zbl 1369.93674
[83] Obozinski, G.; Wainwright, M. J.; Jordan, M. I., Union support recovery in high-dimensional multivariate regression, The Annals of Statistics (2010), (in press)
[84] Ohlsson, H.; Gustafsson, F.; Ljung, L.; Boyd, S., Smoothed state estimates under abrupt changes using sum-of-norms regularization, Automatica, 48, 595-605 (2012) · Zbl 1238.93105
[85] Ohlsson, H.; Ljung, L., Identification of switched linear regression models using sum-of-norms regularization, Automatica, 49, 4, 1045-1050 (2013) · Zbl 1285.93100
[86] Oksendal, B., Stochastic differential equations (2005), Springer
[87] Paige, C. C.; Saunders, M. A., Least squares estimation of discrete linear dynamic systems using orthogonal transformations, SIAM Journal on Numerical Analysis, 14, 2, 180-193 (1977) · Zbl 0358.93037
[88] Passty, G. B., Ergodic convergence to a zero of the sum of monotone operators in Hilbert space, Journal of Mathematical Analysis and Applications, 72, 2, 383-390 (1979) · Zbl 0428.47039
[89] Pontil, M.; Mukherjee, S.; Girosi, F., On the noise model of support vector machines regression, (Algorithmic learning theory (2000), Springer), 316-324 · Zbl 1046.68556
[90] (Rachev, S. T., Handbook of heavy tailed distributions in finance (2003), Elsevier Science)
[91] Rauch, H. E.; Tung, F.; Striebel, C. T., Maximum likelihood estimates of linear dynamic systems, AIAA Journal, 3, 8, 1145-1150 (1965)
[92] Rice, J., Choice of smoothing parameter in deconvolution problems, Contemporary Mathematics, 59, 137-151 (1986) · Zbl 0623.62032
[93] Rockafellar, R., (Convex analysis. Convex analysis, Priceton landmarks in mathematics (1970), Princeton University Press) · Zbl 0193.18401
[94] Rockafellar, R. T., Augmented Lagrangian multiplier functions and duality in nonconvex programming, SIAM Journal on Control, 12, 268-285 (1974) · Zbl 0257.90046
[95] Rockafellar, R. T.; Wets, R. J.B., Variational analysis, Vol. 317 (1998), Springer · Zbl 0888.49001
[96] Schölkopf, B.; Smola, A. J., (Learning with Kernels: Support vector machines, regularization, optimization, and beyond. Learning with Kernels: Support vector machines, regularization, optimization, and beyond, (Adaptive computation and machine learning) (2001), MIT Press)
[97] Simon, D., Kalman filtering with state constraints: a survey of linear and nonlinear algorithms, IET Control Theory & Applications, 4, 8, 1303-1318 (2010)
[98] Simon, D.; Chia, T. L., Kalman filtering with state equality constraints, IEEE Transactions on Aerospace and Electronic Systems, 38, 1, 128-136 (2002)
[99] Tibshirani, R., Regression shrinkage and selection via the lasso, Journal of the Royal Statistical Society. Series B., 58, 1, 267-288 (1996) · Zbl 0850.62538
[100] Tikhonov, A.; Arsenin, V., Solutions of Ill-Posed problems (1977), Winston/Wiley: Winston/Wiley Washington, D.C. · Zbl 0354.65028
[101] Tropp, J. A.; Gilbert, A. C.; Strauss, M. J., Algorithms for simultaneous sparse approximation, Signal Processing, 86, 572-602 (2006), Special issue on “Sparse approximations in signal and image processing” · Zbl 1163.94396
[102] Van de Geer, S.; Buhlmann, P., On the conditions used to prove oracle results for the Lasso, Electronic Journal of Statistics, 3, 1360-1392 (2009) · Zbl 1327.62425
[103] Van den Berg, E.; Friedlander, M. P., Probing the Pareto frontier for basis pursuit solutions, SIAM Journal on Scientific Computing, 31, 2, 890-912 (2008) · Zbl 1193.49033
[104] Vito, E. D.; Rosasco, L.; Caponnetto, A.; De Giovannini, U.; Odone, F., Learning from examples as an Inverse problem, Journal of Machine Learning Research (JMLR), 6, 883-904 (2005) · Zbl 1222.68180
[105] Wahba, G., Spline models for observational data (1990), SIAM: SIAM Philadelphia · Zbl 0813.62001
[106] Wainwright, M. J., Sharp thresholds for high-dimensional and noisy sparsity recovery using \(\ell_1\)-constrained quadratic programming (Lasso), IEEE Transactions on Information Theory, 55, 2183-2202 (2009) · Zbl 1367.62220
[107] Wan, E. A.; Van Der Merwe, R., The unscented Kalman filter for nonlinear estimation, (Adaptive systems for signal processing, communications, and control symposium 2000. AS-SPCC. the IEEE 2000 (2000), Ieee), 153-158
[109] Wipf, D.; Rao, B., An empirical Bayesian strategy for solving the simultaneous sparse approximation problem, IEEE Transactions on Signal Processing, 55, 7, 3704-3716 (2007) · Zbl 1391.62010
[110] Wipf, D. P.; Rao, B. D.; Nagarajan, S., Latent variable Bayesian models for promoting sparsity, IEEE Transactions on Information Theory, 57, 9, 6236-6255 (2011) · Zbl 1365.62103
[111] Wright, S. J., Primal-dual interior-point methods (1997), Siam: Siam Englewood Cliffs, N.J., USA · Zbl 0863.65031
[112] Ye, Y., Interior point algorithms: theory and analysis, Vol. 44 (2011), John Wiley & Sons
[113] Yuan, M.; Lin, Y., Model selection and estimation in regression with grouped variables, Journal of the Royal Statistical Society. Series B., 68, 49-67 (2006) · Zbl 1141.62030
[114] Zhao, P.; Rocha, G.; Yu, B., Grouped and hierarchical model selection through composite absolute penalties, The Annals of Statistics, 37, 6A, 3468-3497 (2009) · Zbl 1369.62164
[115] Zhao, P.; Yu, B., On model selection consistency of Lasso, Journal of Machine Learning Research (JMLR), 7, 2541-2567 (2006) · Zbl 1222.62008
[116] Zou, H.; Hastie, T., Regularization and variable selection via the elastic net, Journal of the Royal Statistical Society. Series B. Statistical Methodology, 67, 2, 301-320 (2005) · Zbl 1069.62054
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.