Abstract
Motivated by the need to analyze readily available data collected in space and time, especially in environmental sciences, we propose a parsimonious spatiotemporal model for time series data on a spatial grid. In essence, our model is a vector autoregressive model that utilizes the spatial structure to achieve parsimony of autoregressive matrices at two levels. The first level ensures the sparsity of the autoregressive matrices using a lagged-neighborhood scheme. The second level performs a spatial clustering of the nonzero autoregressive coefficients such that within some subregions, nearby locations share the same autoregressive coefficients while across different subregions the coefficients may have distinct values. The model parameters are estimated using the penalized maximum likelihood with an adaptive fused Lasso penalty. The estimation procedure can be tailored to accommodate the need and prior knowledge of a modeler. Performance of the proposed estimation algorithm is examined in a simulation study. Our method gives reliable estimation results that are interpretable and especially useful to identify geographical subregions, within each of which, the time series have similar dynamical behavior with homogeneous autoregressive coefficients. We apply our model to a wind speed time series dataset generated from a climate model over Saudi Arabia to illustrate its power in explaining the dynamics by the spatially structured coefficients. Moreover, the estimated model can be useful for building stochastic weather generators as an approximation of the computationally expensive climate model.
Similar content being viewed by others
References
Ailliot P, Monbet V, Prevosto M (2006) An autoregressive model with time-varying coefficients for wind fields. Environmetrics 17(2):107–117
Arnold TB, Tibshirani RJ (2020) genlasso: path algorithm for generalized Lasso problems. R Package Vers 1:5
Bańbura M, Giannone D, Reichlin L (2010) Large Bayesian vector auto regressions. J Appl Econ 25(1):71–92
Basu S, Michailidis G (2015) Regularized estimation in sparse high-dimensional time series models. Ann Stat 43(4):1535–1567
Bergmeir C, Benítez JM (2012) On the use of cross-validation for time series predictor evaluation. Inf Sci 191:192–213
Bessac J, Ailliot P, Monbet V (2015) Gaussian linear state-space model for wind fields in the North-East Atlantic. Environmetrics 26(1):29–38
Cressie N, Wikle CK (2011) Statistics for spatio-temporal data. Wiley, Hoboken
de Luna X, Genton MG (2005) Predictive spatio-temporal models for spatially sparse environmental data. Statistica Sinica 15:547–568
Fan J, Zhang W (1999) Statistical estimation in varying coefficient models. Ann Stat 27(5):1491–1518
Gelfand AE, Kim H-J, Sirmans CF, Banerjee S (2003) Spatial modeling with spatially varying coefficient processes. J Am Stat Assoc 98(462):387–396
Genton M, Johnson C, Potter K, Stenchikov G, Sun Y (2014) Surface boxplots. Stat 3(1):1–11
Hsu N-J, Hung H-L, Chang Y-M (2008) Subset selection for vector autoregressive processes using Lasso. Comput Stat Data Anal 52(7):3645–3657
Huang H-C, Hsu N-J, Theobald DM, Breidt FJ (2010) Spatial Lasso with applications to GIS model selection. J Comput Gr Stat 19(4):963–983
Kastner G, Huber F (2020) Sparse Bayesian vector autoregressions in huge dimensions. J Forecast 39:1142–1165
Katzfuss M, Cressie N (2012) Bayesian hierarchical spatio-temporal smoothing for very large datasets. Environmetrics 23(1):94–107
Kay JE, Deser C, Phillips A, Mai A, Hannay C, Strand G, Arblaster JM, Bates SC, Danabasoglu G, Edwards J, Holland M, Kushner P, Lamarque J-F, Lawrence D, Lindsay K, Middleton A, Munoz E, Neale R, Oleson K, Polvani L, Vertenstein M (2015) The community earth system model (CESM) large ensemble project: a community resource for studying climate change in the presence of internal climate variability. Bull Am Meteorol Soc 96(8):1333–1349
Ke ZT, Fan J, Wu Y (2015) Homogeneity pursuit. J Am Stat Assoc 110(509):175–194
Korobilis D, Pettenuzzo D (2019) Adaptive hierarchical priors for high-dimensional vector autoregressions. J Econ 212(1):241–271
Li F, Sang H (2019) Spatial homogeneity pursuit of regression coefficients for large datasets. J Am Stat Assoc 114(527):1050–1062
Lütkepohl H (2005) New introduction to multiple time series analysis. Springer, Berlin
Monbet V, Ailliot P (2017) Sparse vector Markov switching autoregressive models. Application to multivariate time series of temperature. Comput Stat Data Anal 108:40–51
Mu J, Wang G, Wang L (2018) Estimation and inference in spatially varying coefficient models. Environmetrics 29(1):e2485
Ngueyep R, Serban N (2015) Large-vector autoregression for multilayer spatially correlated time series. Technometrics 57(2):207–216
R Development Core Team (2020) R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna
Rao SS (2008) Statistical analysis of a spatio-temporal model with location-dependent parameters and a test for spatial stationarity. J Time Ser Anal 29(4):673–694
Schweinberger M, Babkin S, Ensor KB (2017) High-dimensional multivariate time series with additional structure. J Comput Gr Stat 26(3):610–622
Shen X, Huang H-C (2010) Grouping pursuit through a regularization solution surface. J Am Stat Assoc 105(490):727–739
Sun Y, Genton MG (2011) Functional boxplots. J Comput Gr Stat 20(2):316–334
Sun Y, Wang HJ, Fuentes M (2016) Fused adaptive Lasso for spatial and temporal quantile function estimation. Technometrics 58(1):127–137
Tagle F, Castruccio S, Crippa P, Genton MG (2019) A non-Gaussian spatio-temporal model for daily wind speeds based on a multivariate skew-\(t\) distribution. J Time Ser Anal 40:312–326
Tibshirani R, Saunders M, Rosset S, Zhu J, Knight K (2005) Sparsity and smoothness via the fused lasso. J Roy Stat Soc B 67(1):91–108
Tibshirani RJ, Taylor J (2011) The solution path of the generalized lasso. Ann Stat 39(3):1335–1371
Viallon V, Lambert-Lacroix S, Höfling H, Picard F (2013) Adaptive generalized fused-Lasso: asymptotic properties and applications. In: HAL preprint, hal-00813281
Wikle CK, Berliner LM, Cressie N (1998) Hierarchical Bayesian space-time models. Environ Ecol Stat 5(2):117–154
Wikle CK, Milliff RF, Nychka D, Berliner LM (2001) Spatiotemporal hierarchical bayesian modeling: tropical ocean surface winds. J Am Stat Assoc 96(454):382–397
Yan Y, Genton MG (2019) Non-Gaussian autoregressive processes with Tukey \(g\)-and-\(h\) transformations. Environmetrics 30:e2503
Zhao Y, Bondell H (2020) Solution paths for the generalized lasso with applications to spatially varying coefficients regression. Comput Stat Data Anal 142:106821
Zou H (2006) The adaptive lasso and its oracle properties. J Am Stat Assoc 101(476):1418–1429
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This publication is based upon work supported by the King Abdullah University of Science and Technology (KAUST) Office of Sponsored Research (OSR) under Award No: OSR-2018-CRG7-3742 and Academia Sinica Investigator Award AS-IA-109-M05.
A Appendix
A Appendix
Derivation of (5):
For the two terms involved in the above equation,
Then, it is easy to see that the log-likelihood is a quadratic form of \(\varvec{\alpha }\), i.e.,
where \(\varvec{X}\) and \(\varvec{y}\) satisfy (3) and (4). \(\square \)
Asymptotic Properties:
Some notations are needed to state the asymptotic properties. We consider an undirected graph \({\mathcal {G}}=({\mathcal {V}},{\mathcal {E}})\), where \({\mathcal {V}}=\{1,\ldots ,m\}\) is a set of vertices corresponding to the \(m=K n_I + n_B\) parameters in \(\varvec{\alpha }\) to be estimated, and \({\mathcal {E}}\subset {\mathcal {V}}\times {\mathcal {V}}\) is a set of undirected edges that depends on the structure of the spatial grid. Here \({\mathcal {V}}\) can be partitioned into \(n_B+K\) disjoint components, including \(n_B\) isolated nodes (corresponding to the non-regularized coefficients in \(\varvec{\alpha }_\text {B}\)) and K identical disjoint components each with \(n_I\) vertices and the same inner grid structure (corresponding to the penalized coefficients in \(\varvec{\alpha }_k\), for \(k=1,\ldots ,K\)). With these notations, the penalty term in (6) can be rewritten as
where \(w_{{i},{j}}=|{\tilde{\alpha }}_i-{\tilde{\alpha }}_j|^{-\gamma }\).
We let \(\varvec{\alpha }^0\) be the true parameter vector and define \(\varvec{\xi }=\varvec{y}-\varvec{X}\varvec{\alpha }^0\). We introduce \({\mathcal {A}}=\left\{ (i,j) \in {\mathcal {E}}: \alpha ^0_i=\alpha ^0_j, i,j=1,\ldots ,n_I \right\} \) and consider the subgraph \({\mathcal {G}}_{{\mathcal {A}}}=({\mathcal {V}},{\mathcal {A}})\) of \({\mathcal {G}}\). We denote by \(m_0\) the number of its connected components, that is, the number of distinct values in \(\varvec{\alpha }^0\) supported by \({\mathcal {G}}\). We further denote by \({\mathcal {V}}_1,\ldots , {\mathcal {V}}_{m_0}\) the sets of nodes of each connected components of \({\mathcal {G}}_{{\mathcal {A}}}\) and set \(l_i=\min {{\mathcal {V}}_i}\) for \(i=1,\ldots ,m_0\). We define \(\varvec{\alpha }_{{\mathcal {A}}}^0=(\alpha ^0_{l_1},\ldots ,\alpha ^0_{l_{m_0}})'\) and \(\varvec{X}_{\mathcal {A}}\) a matrix whose i-th column, for \(i=1,\ldots ,m_0\), is \(\varvec{X}_{{\mathcal {A}}_i}=\sum _{j\in {\mathcal {V}}_i}\varvec{X}_j\), where \(\varvec{X}_j\) is the j-th column of \(\varvec{X}\). We assume that \(\varvec{X}'\varvec{X}/(T-1)\rightarrow \mathbf {C}\) as \(T\rightarrow \infty \) for some positive-definite \(m\times m\) matrix \(\mathbf {C}\), which depends on \(\varvec{\alpha }^0\) and \(\varvec{\Psi }\). We denote by \(\mathbf {C}_{{\mathcal {A}}}\) the limiting \(m_0\times m_0\) matrix of \(\varvec{X}_{{\mathcal {A}}}'\varvec{X}_{{\mathcal {A}}}/(T-1)\) as \(T\rightarrow \infty \).
We derive the asymptotic properties of the adaptive Lasso estimator \(\hat{\varvec{\alpha }}\) from solving (7). We let \({\mathcal {A}}_n=\left\{ (i,j) \in {\mathcal {E}}: {\hat{\alpha }}_i={\hat{\alpha }}_j,i,j=1,\ldots ,n_I \right\} \) and \(\hat{\varvec{\alpha }}_{{\mathcal {A}}}=({\hat{\alpha }}_{l_1},\ldots ,{\hat{\alpha }}_{l_{m_0}})'\).
Theorem 1
Suppose that \(\lambda /\sqrt{T}\rightarrow 0,\, \lambda T^{(\gamma -1)/2}\rightarrow \infty \), and \(\mathbf {C}=\lim _{T\rightarrow \infty }\) \(\varvec{X}'\varvec{X}/(T-1)\) is positive-definite. Then as \(T\rightarrow \infty \), the following are satisfied by the adaptive Lasso estimator \(\hat{\varvec{\alpha }}\):
-
1.
Consistency in grouping: \(\Pr ({\mathcal {A}}_n={\mathcal {A}})\rightarrow 1\).
-
2.
Asymptotic normality: \(\sqrt{T}\big (\hat{\varvec{\alpha }}_{{\mathcal {A}}}-\varvec{\alpha }^0_{{\mathcal {A}}}\big )\xrightarrow {d}{\mathcal {N}}_{m_0}(\varvec{0},\mathbf {C}_{{\mathcal {A}}}^{-1})\).
Proof
The following proof is obtained by adapting the Proof of Theorem 2 in Zou (2006) and Theorem 3 in Viallon et al. (2013).
First, it is known that \(\hat{\varvec{\Psi }}\) in (8) is a consistent estimator of \(\varvec{\Psi }\), as T goes to infinity; see Chapter 5 of Lütkepohl (2005). We define \(V_T(\mathbf {u})=F(\varvec{\alpha }^0)-F(\varvec{\alpha }^0+\mathbf {u}/\sqrt{T})\), with F defined in (6). It is obvious that \(V_T(\mathbf {u})\) is minimized at \(\sqrt{T}(\hat{\varvec{\alpha }}-\varvec{\alpha }^0)\) and
We have
We denote \(\varvec{\mathbf {u}}_{{\mathcal {A}}}=(\mathbf {u}_{l_1},\ldots ,\mathbf {u}_{l_{m_0}})'\) and then, with an application of the Martingale difference central limit theory to \(\varvec{\xi }'\varvec{X}\), we obtain
for \(\mathbf {u}\in {\mathbb {R}}^m\), where \({\mathbf{W }}_{\mathcal {A}}\sim {\mathcal {N}}_m(\varvec{0},{\mathbf{C }}_{{\mathcal {A}}})\); V is convex and has a unique minimum satisfying \(u_i=u_j\) for all \((i,j)\in {\mathcal {A}}\) and \(\mathbf {u}_{\mathcal {A}}=\mathbf {C}_{{\mathcal {A}}}^{-1}\mathbf {W}_{\mathcal {A}}\). The asymptotic normality part can be derived as in Zou (2006) by using the epi-convergence results.
For consistency in grouping, we need to show that for all \((i,j)\not \in {\mathcal {A}}, \Pr ((i,j)\in {\mathcal {A}}_n^c)\rightarrow 1\) and for all \((i,j) \in {\mathcal {A}}, \Pr ((i,j)\in {\mathcal {A}}_n^c)\rightarrow 0\). The first part is implied by the asymptotic normality result. To prove the second part, we apply the subgradient equations for the optimality condition, for \(i=1,\ldots ,m\):
where \(t_{ij}={{\,\mathrm{sign}\,}}({\hat{\alpha }}_i-{\hat{\alpha }}_j)\) if \({\hat{\alpha }}_i\ne {\hat{\alpha }}_j\) and \(t_{ij}\) is some real number in \([-1,1]\) if \({\hat{\alpha }}_i= {\hat{\alpha }}_j.\) We prove by contradiction. Suppose that for \({\mathcal {V}}_k\) that contains at least two vertices, there exist \(i,j \in {\mathcal {V}}_k\) such that \({\hat{\alpha }}_i\ne {\hat{\alpha }}_j\). We define \(\displaystyle a^{\min }=\min _{i \in {\mathcal {V}}_k} {\hat{\alpha }}_i\) and \({\mathcal {V}}^{\min }=\left\{ i: i \in {\mathcal {V}}_k \text { and } {\hat{\alpha }}_i=a^{\min }\right\} \). Summing up the optimality conditions over the indices in \({\mathcal {V}}^{\min }\), we get:
where in the right-hand side, the first sum converges to 0 in probability, while the second sum tends to \(-\infty \). However, the left-hand side is \(O_{p}(1)\), since it can be decomposed as the sum of two asymptotically normal variables as in Zou (2006) with an application of the martingale central limit theorem. Therefore, \(\Pr ((i,j)\in {\mathcal {A}}_n^c)\rightarrow 0\). \(\square \)
Rights and permissions
About this article
Cite this article
Yan, Y., Huang, HC. & Genton, M.G. Vector Autoregressive Models with Spatially Structured Coefficients for Time Series on a Spatial Grid. JABES 26, 387–408 (2021). https://doi.org/10.1007/s13253-021-00444-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13253-021-00444-4