Abstract
Analyzing massive spatial datasets using a Gaussian process model poses computational challenges. This is a problem prevailing heavily in applications such as environmental modeling, ecology, forestry and environmental health. We present a novel approximate inference methodology that uses profile likelihood and Krylov subspace methods to estimate the spatial covariance parameters and makes spatial predictions with uncertainty quantification for point-referenced spatial data. “Kryging” combines Kriging and Krylov subspace methods and applies for both observations on regular grid and irregularly spaced observations, and for any Gaussian process with a stationary isotropic (and certain geometrically anisotropic) covariance function, including the popular Matérn covariance family. We make use of the block Toeplitz structure with Toeplitz blocks of the covariance matrix and use fast Fourier transform methods to bypass the computational and memory bottlenecks of approximating log-determinant and matrix-vector products. We perform extensive simulation studies to show the effectiveness of our model by varying sample sizes, spatial parameter values and sampling designs. A real data application is also performed on a dataset consisting of land surface temperature readings taken by the MODIS satellite. Compared to existing methods, the proposed method performs satisfactorily with much less computation time and better scalability.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Availability of Data and Material
The dataset analyzed in Sect. 5 is available in the GitHub repository for the Heaton et al. (2019) project at this GitHub repository.
References
Ambikasaran, S., Foreman-Mackey, D., Greengard, L., Hogg, D.W., O’Neil, M.: Fast direct methods for Gaussian processes. IEEE Trans. Pattern Anal. Mach. Intell. 38(2), 252–265 (2015)
Anitescu, M., Chen, J., Wang, L.: A matrix-free approach for solving the parametric Gaussian process maximum likelihood problem. SIAM J. Sci. Comput. 34(1), A240–A262 (2012)
Aune, E., Simpson, D.P., Eidsvik, J.: Parameter estimation in high dimensional gaussian distributions. Stat. Comput. 24(2), 247–263 (2014)
Banerjee, S., Gelfand, A.E., Finley, A.O., Sang, H.: Gaussian predictive process models for large spatial data sets. J. R. Statist. Soc. Ser. B Statist. Methodol. 70(4), 825–848 (2008)
Barbian, M.H., Assunção, R.M.: Spatial subsemble estimator for large geostatistical data. Spat. Statist. 22, 68–88 (2017)
Benbow, S.J.: Solving generalized least-squares problems with LSQR. SIAM J. Matrix Anal. Appl. 21(1), 166–177 (1999)
Bradley, J.R., Cressie, N., Shi, T., et al.: A comparison of spatial predictors when datasets could be very large. Statist. Surv. 10, 100–131 (2016)
Chung, J., Saibaba, A.K.: Generalized hybrid iterative methods for large-scale Bayesian inverse problems. SIAM J. Sci. Comput. 39(5), S24–S46 (2017)
Chung, J., Saibaba, A.K., Brown, M., Westman, E.: Efficient generalized Golub-Kahan based methods for dynamic inverse problems. Inverse Prob. 34(2), 024005 (2018)
Cox, D.R., Snell, E.J.: Analysis of Binary Data, vol. 32. CRC Press, Cambridge (1989)
Cressie, N., Johannesson, G.: Fixed rank Kriging for very large spatial data sets. J. R. Statist. Soc. Ser. B Statist. Methodol. 70(1), 209–226 (2008)
Datta, A., Banerjee, S., Finley, A.O., Gelfand, A.E.: Hierarchical nearest-neighbor Gaussian process models for large geostatistical datasets. J. Am. Stat. Assoc. 111(514), 800–812 (2016)
Datta, A., Banerjee, S., Finley, A.O., Gelfand, A.E.: On nearest-neighbor Gaussian process models for massive spatial data. Wiley Interdiscip. Rev. Comput. Statist. 8(5), 162–171 (2016)
Datta, A., Banerjee, S., Finley, A.O., Hamm, N.A., Schaap, M.: Nonseparable dynamic nearest neighbor Gaussian process models for large spatio-temporal data with an application to particulate matter analysis. Ann. Appl. Statist. 10(3), 1286 (2016)
Den Hertog, D., Kleijnen, J.P., Siem, A.Y.: The correct Kriging variance estimated by bootstrapping. J. Oper. Res. Soc. 57(4), 400–409 (2006)
Dutta, S., Mondal, D.: REML estimation with intrinsic Matérn dependence in the spatial linear mixed model. Electr. J. Statist. 10(2), 2856–2893 (2016)
Eidsvik, J., Shaby, B.A., Reich, B.J., Wheeler, M., Niemi, J.: Estimation and prediction in spatial models with block composite likelihoods. J. Comput. Graph. Stat. 23(2), 295–315 (2014)
Eriksson, D., Dong, K., Lee, E., Bindel, D., Wilson, A.G.: Scaling Gaussian process regression with derivatives. In: Advances in Neural Information Processing Systems, pp. 6867–6877 (2018)
Finley, A.O., Sang, H., Banerjee, S., Gelfand, A.E.: Improving the performance of predictive process modeling for large datasets. Comput. Statist. Data Anal. 53(8), 2873–2884 (2009)
Fuentes, M.: Approximate likelihood for large irregularly spaced spatial data. J. Am. Stat. Assoc. 102(477), 321–331 (2007)
Furrer, R., Genton, M.G., Nychka, D.: Covariance tapering for interpolation of large spatial datasets. J. Comput. Graph. Stat. 15(3), 502–523 (2006)
Gneiting, T., Ševčíková, H., Percival, D.B., Schlather, M., Jiang, Y.: Fast and exact simulation of large Gaussian lattice systems in \({\mathbb{R}}^2\): Exploring the limits. J. Comput. Graph. Stat. 15(3), 483–501 (2006)
Graham, I.G., Kuo, F.Y., Nuyens, D., Scheichl, R., Sloan, I.H.: Analysis of circulant embedding methods for sampling stationary random fields. SIAM J. Numer. Anal. 56(3), 1871–1895 (2018)
Gray, R.M.: Toeplitz and circulant matrices: A review. Found. Trends® Commun. Inf. Theory 2(3), 155–239 (2006)
Guhaniyogi, R., Banerjee, S.: Meta-Kriging: Scalable Bayesian modeling and inference for massive spatial datasets. Technometrics 60(4), 430–444 (2018)
Guinness, J.: Spectral density estimation for random fields via periodic embeddings. Biometrika 106(2), 267–286 (2019)
Guinness, J., Fuentes, M.: Circulant embedding of approximate covariances for inference from Gaussian data on large lattices. J. Comput. Graph. Stat. 26(1), 88–97 (2017)
Gyires, B.: Eigenwerte verallgemeinerter Toeplitzschen matrizen. Publ. Math. Debrecen. 4, 171–179 (1956)
Heaton, M.J., Datta, A., Finley, A.O., Furrer, R., Guinness, J., Guhaniyogi, R., Gerber, F., Gramacy, R.B., Hammerling, D., Katzfuss, M., et al.: A case study competition among methods for analyzing large spatial data. J. Agric. Biol. Environ. Stat. 24(3), 398–425 (2019)
Higdon, D.: Space and space-time modeling using process convolutions. In: Quantitative methods for current environmental issues, Springer, pp. 37–56 (2002)
Kang, E.L., Cressie, N.: Bayesian inference for the spatial random effects model. J. Am. Stat. Assoc. 106(495), 972–983 (2011)
Katzfuss, M.: A multi-resolution approximation for massive spatial datasets. J. Am. Stat. Assoc. 112(517), 201–214 (2017)
Katzfuss, M., Cressie, N.: Spatio-temporal smoothing and EM estimation for massive remote-sensing data sets. J. Time Ser. Anal. 32(4), 430–446 (2011)
Katzfuss, M., Hammerling, D.: Parallel inference for massive distributed spatial data using low-rank models. Stat. Comput. 27(2), 363–375 (2017)
Kaufman, C.G., Schervish, M.J., Nychka, D.W.: Covariance tapering for likelihood-based estimation in large spatial data sets. J. Am. Stat. Assoc. 103(484), 1545–1555 (2008)
Kent, J.T., Mardia, K.V.: Spectral and circulant approximations to the likelihood for stationary Gaussian random fields. J. Statist. Plan. Inference 50(3), 379–394 (1996)
Lindgren, F., Rue, H., Lindström, J.: An explicit link between Gaussian fields and Gaussian Markov random fields: the stochastic partial differential equation approach. J. R. Statist. Soc. Ser. B Statist. Methodol. 73(4), 423–498 (2011)
Liu, H., Ong, Y.S., Shen, X., Cai, J.: When Gaussian process meets big data: A review of scalable GPs. IEEE Trans. Neural Netw. Learn. Syst. (2020)
Martino, S., Rue, H.: Implementing approximate bayesian inference using integrated nested laplace approximation: A manual for the inla program. Department of Mathematical Sciences, NTNU, Norway (2009)
Matérn, B.: Spatial variation, volume 36 of. Lecture Notes in Statistics (1960)
Minden, V., Damle, A., Ho, K.L., Ying, L.: Fast spatial Gaussian process maximum likelihood estimation via skeletonization factorizations. Multiscale Model. Simul. 15(4), 1584–1611 (2017)
Nychka, D., Bandyopadhyay, S., Hammerling, D., Lindgren, F., Sain, S.: A multiresolution Gaussian process model for the analysis of large spatial datasets. J. Comput. Graph. Stat. 24(2), 579–599 (2015)
Paciorek, C.J., Lipshitz, B., Zhuo, W., Kaufman, C.G., Thomas, R.C., et al.: Parallelizing Gaussian Process Calculations in R. J. Statist. Softw. 63(i10), (2015)
Rue, H., Held, L.: Gaussian Markov random fields: theory and applications. CRC Press, Cambridge (2005)
Saad, Y.: Iterative methods for sparse linear systems, 2nd edn. Society for Industrial and Applied Mathematics, Philadelphia, PA (2003), https://doi.org/10.1137/1.9780898718003, https://doi-org.prox.lib.ncsu.edu/10.1137/1.9780898718003
Stein, M.L.: Fast and exact simulation of fractional Brownian surfaces. J. Comput. Graph. Stat. 11(3), 587–599 (2002)
Stein, M.L.: Statistical properties of covariance tapers. J. Comput. Graph. Stat. 22(4), 866–885 (2013)
Stein, M.L., Chi, Z., Welty, L.J.: Approximating likelihoods for large spatial data sets. J. R. Statist. Soc. Ser. B Statist. Methodol. 66(2), 275–296 (2004)
Sun, Y., Li, B., Genton, M.G.: Geostatistics for large datasets. In: Advances and challenges in space-time modelling of natural events, Springer, pp. 55–77 (2012)
Ubaru, S., Chen, J., Saad, Y.: Fast estimation of tr(f(\({A}\))) via stochastic Lanczos quadrature. SIAM J. Matrix Anal. Appl. 38(4), 1075–1099 (2017)
Varin, C., Reid, N., Firth, D.: An overview of composite likelihood methods. Statistica Sinica pp 5–42 (2011)
Vecchia, A.V.: Estimation and model identification for continuous spatial processes. J. R. Stat. Soc. Ser. B Methodol. 50(2), 297–312 (1988)
Wendland, H.: Piecewise polynomial, positive definite and compactly supported radial functions of minimal degree. Adv. Comput. Math. 4(1), 389–396 (1995)
Widom, H.: Asymptotic behavior of block Toeplitz matrices and determinants. Adv. Math. 13(3), 284–322 (1974)
Acknowledgements
The authors were partially supported by the National Science Foundation through the awards DMS-1845406 and DMS-1638521. The authors were also partially supported by the National Institute of Health through the awards R01ES031651-01 and R01ES027892 and by The King Abdullah University of Science and Technology grant 3800.2. We would like to thank them for their support.
Funding
The authors were partially supported by the National Science Foundation through the awards DMS-1845406 and DMS-1638521. The authors were also partially supported by the National Institute of Health through the awards R01ES031651-01 and R01ES027892 and by The King Abdullah University of Science and Technology grant 3800.2.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors have no conflicts of interest to declare that are relevant to the content of this article.
Code availability
A GitHub repository has been set up that contains codes and a demonstration file for the methods described in the article.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Gradient and Hessian computation for the optimization procedure
In this section, we present the necessary details of computing and approximating the gradient and Hessian for the optimization routine.
We first derive exact expressions for the gradient and then show how to approximate them using the strategy in Sects. 3.2 and 3.1. Computing the analytical gradient would require computing derivatives of \({\varvec{\Gamma } }= \left( \frac{1}{\sigma ^2}{\varvec{\Sigma }}(\varvec{\theta })^{-1} + \frac{1}{\tau ^2}{\mathbf {A}}^\mathsf {T}{\mathbf {A}}\right) ^{-1}\) and \({\widehat{{\mathbf {x}}}}(\varvec{\theta })\) with respect to each of \(\mu , \sigma ^2, \tau ^2\) and \(\rho \). For convenience, we reparameterize \(1/\sigma ^2 = \lambda ^2\) and \(1/\tau ^2 = \lambda _e^2\). Using the precision instead of variance brings about greater ease in computing the analytical derivatives. Under the new parameterization,
and
where \({\widehat{\varvec{\psi }}}(\varvec{\theta }) = {\mathbf {y}}- {\mathbf {X}}\varvec{\beta }- {\mathbf {A}}{\hat{{\mathbf {x}}}}(\varvec{\theta })\).
The derivatives for \(\varvec{{\Gamma } }\) are computed to be
where \(\text{ d }{\varvec{\Sigma }}(\rho )\) denotes the derivative of \({\varvec{\Sigma }}(\rho )\) with respect to \(\rho \). This is easy to compute analytically and has the nice BTTB property that \({\varvec{\Sigma }}(\rho )\) has.
Using the expressions in (18), we compute the derivatives of \({\widehat{{\mathbf {x}}}}(\varvec{\theta })\) to be
Substituting the expressions for analytical derivatives of \(\varvec{\Gamma }\) and \({\widehat{{\mathbf {x}}}}(\varvec{\theta })\) in the expression for the analytical gradient, we have it computed to be
where \(\text {dL}\) is the derivative of \(\log \det \,{\varvec{\Sigma }}(\rho )\) with respect to \(\rho \).
We approximate the gradient expressions in (20) by approximating \({\widehat{{\mathbf {x}}}}(\varvec{\theta })\) by \({\mathbf {x}}_k^*(\varvec{\theta })\) as in (12) and using the exact arithmetic identities expressed in (10). The approximated gradients can be computed as
where \(\varvec{\psi }^*_k(\varvec{\theta }) = {\mathbf {y}}- {\mathbf {X}}\varvec{\beta }- {\mathbf {A}}{\mathbf {x}}^*_k(\varvec{\theta })\) and \({\mathbf {V}}_k, {\mathbf {z}}_k\) have been defined in Sect. 3.2.
\(\widehat{\text {dL}}\) is an approximation to \(\text {dL}\), the derivative of the log-determinant of \({\varvec{\Sigma }}(\rho )\) with respect to \(\rho \). The analytical expression for \(\text {dL}\) turns out to be
This is infeasible to compute directly and is therefore approximated using the BTTB structure of \({\varvec{\Sigma }}(\rho )\) and \(\text {d}{\varvec{\Sigma }}(\rho )\).
Any symmetric matrix with BTTB structure can be extended to have a BCCB structure as was done in computing the log-determinant itself and one can extract the eigenvalues of the matrix with BTTB structure using the matrix with BCCB structure. Any BCCB matrix is diagonalizable as \({\mathbf {F}}{\mathbf {D}}{\mathbf {F}}^\mathsf{T}\), where \({\mathbf {F}}\) is a scaled matrix consisting of d-dimensional (d=2, in our case) Fourier coefficients, irrespective of the BCCB matrix being diagonalized. Therefore, we can say
These imply that
Since both \({\mathbf {D}}_1\) and \({\mathbf {D}}_2\) are diagonal, approximating \(\text {dL}\) boils down to computing \({\mathbf {D}}_1\) and \({\mathbf {D}}_2\), which can be computed by d-dimensional FFT of the corresponding first circulant block structures of the extended BCCB structure and subsetting it properly. The equivalence in computing the derivative of log-determinant of the BTTB and matrix and its corresponding BCCB matrix has been demonstrated by Kent and Mardia (1996), showing the approximation to have the same error rate as in approximating the log-determinant itself. Approximating the derivative of the log-determinant term also costs the same as approximating the log-determinant itself, \({\mathcal {O}}(n \log n)\).
While minimizing the negative log-likelihood function, the Hessian turns out to be simply the Information matrix \({\mathbf {I}}(\varvec{\theta })\). While
we also have
Here the expectations are computed with respect to \({\mathbf {y}}\) and \(\nabla \), \(\nabla _2\) represent the gradient and Hessian created by computing first- and second-order partial derivatives with respect to \(\varvec{\theta }\). Therefore, the outer product of the gradient with itself serves as a rank-one estimate for the Hessian for a likelihood optimization problem. Although we are using profile likelihood instead of the actual likelihood function, the approximation still stands in an asymptotic sense since both the actual likelihood estimator and the profile likelihood estimators have the same asymptotic properties. This prompts us to take the outer product of the approximated gradient with itself as a rank-one approximation to the Hessian.
However, we compute the unique entries of the exact Hessian to be
where \(\text {d}^2\text {L}\) represents the second derivative of \(\log \det \,{\varvec{\Sigma }}(\rho )\) with respect to \(\rho \) and \(\text {d}^2{\varvec{\Sigma }}(\rho )\) is the second derivative of \({\varvec{\Sigma }}(\rho )\) with respect to \(\rho \). \(\text {d}^2{\varvec{\Sigma }}(\rho )\) also has a BTTB structure as \(\varvec{{\Sigma }}(\rho )\) and \(\text {d}{\varvec{\Sigma }}(\rho )\).
These entries are then approximated using the approximation to \({\varvec{\Gamma }}\) as presented in Chung et al. (2018), namely
where \({\mathbf {Z}}_k = {\varvec{\Sigma }}(\rho ){\mathbf {V}}_k{\mathbf {W}}_k\) with \({\mathbf {B}}_k^\mathsf{T}{\mathbf {B}}_k = {\mathbf {W}}_k\varvec{\varTheta }_k{\mathbf {W}}_k\) and \(\varvec{\varDelta }_k = \left( {\mathbf {I}}+ \lambda ^{-2}\varvec{\varTheta }_k \right) ^{-1}\).
We define \({\mathbf {z}}_0 = {\mathbf {y}}- {\mathbf {X}}\varvec{\beta }- {\mathbf {A}}{\mathbf {x}}^*_k(\varvec{\theta }) = \varvec{\psi }^*_k(\varvec{\theta })\). The approximated entries of the Hessian are
where \(\widehat{\text {d}^2\text {L}}\) is a numerical approximation to \(\text {d}^2\text {L}\). We do not use this approximation for our computing, but hope to use it in future.
Additional tables from the simulation study
In this section, we provide additional results for the simulation study. Table 6 evaluates parameter estimations for the first simulation study for both SPDE and Kryging methods. The same is done in Tables 7 and 8 for the second and third simulation studies. The results across the board are similar as mentioned in Section 4. SPDE performs better in estimating the nugget parameter \(\tau ^2\), while Kryging performs better in estimating the partial sill parameter \(\sigma ^2\). Both methods do equally well in estimating the mean parameter \(\beta \) and the spatial range parameter \(\rho \).
Rights and permissions
About this article
Cite this article
Majumder, S., Guan, Y., Reich, B.J. et al. Kryging: geostatistical analysis of large-scale datasets using Krylov subspace methods. Stat Comput 32, 74 (2022). https://doi.org/10.1007/s11222-022-10104-3
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11222-022-10104-3