Kryging: geostatistical analysis of large-scale datasets using Krylov subspace methods

Suman Majumder ORCID: orcid.org/0000-0002-3678-5642¹^nAff2,
Yawen Guan³,
Brian J. Reich¹ &
…
Arvind K. Saibaba⁴

300 Accesses
2 Citations
1 Altmetric
Explore all metrics

Abstract

Analyzing massive spatial datasets using a Gaussian process model poses computational challenges. This is a problem prevailing heavily in applications such as environmental modeling, ecology, forestry and environmental health. We present a novel approximate inference methodology that uses profile likelihood and Krylov subspace methods to estimate the spatial covariance parameters and makes spatial predictions with uncertainty quantification for point-referenced spatial data. “Kryging” combines Kriging and Krylov subspace methods and applies for both observations on regular grid and irregularly spaced observations, and for any Gaussian process with a stationary isotropic (and certain geometrically anisotropic) covariance function, including the popular Matérn covariance family. We make use of the block Toeplitz structure with Toeplitz blocks of the covariance matrix and use fast Fourier transform methods to bypass the computational and memory bottlenecks of approximating log-determinant and matrix-vector products. We perform extensive simulation studies to show the effectiveness of our model by varying sample sizes, spatial parameter values and sampling designs. A real data application is also performed on a dataset consisting of land surface temperature readings taken by the MODIS satellite. Compared to existing methods, the proposed method performs satisfactorily with much less computation time and better scalability.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Reduced Basis Kriging for Big Spatial Fields

Article 25 May 2018

Spatial Autocorrelation and Spatial Filtering

Generalizations of Simple Kriging Methods in Spatial Data Analysis

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Availability of Data and Material

The dataset analyzed in Sect. 5 is available in the GitHub repository for the Heaton et al. (2019) project at this GitHub repository.

References

Ambikasaran, S., Foreman-Mackey, D., Greengard, L., Hogg, D.W., O’Neil, M.: Fast direct methods for Gaussian processes. IEEE Trans. Pattern Anal. Mach. Intell. 38(2), 252–265 (2015)
Google Scholar
Anitescu, M., Chen, J., Wang, L.: A matrix-free approach for solving the parametric Gaussian process maximum likelihood problem. SIAM J. Sci. Comput. 34(1), A240–A262 (2012)
MathSciNet MATH Google Scholar
Aune, E., Simpson, D.P., Eidsvik, J.: Parameter estimation in high dimensional gaussian distributions. Stat. Comput. 24(2), 247–263 (2014)
MathSciNet MATH Google Scholar
Banerjee, S., Gelfand, A.E., Finley, A.O., Sang, H.: Gaussian predictive process models for large spatial data sets. J. R. Statist. Soc. Ser. B Statist. Methodol. 70(4), 825–848 (2008)
MathSciNet MATH Google Scholar
Barbian, M.H., Assunção, R.M.: Spatial subsemble estimator for large geostatistical data. Spat. Statist. 22, 68–88 (2017)
MathSciNet Google Scholar
Benbow, S.J.: Solving generalized least-squares problems with LSQR. SIAM J. Matrix Anal. Appl. 21(1), 166–177 (1999)
MathSciNet MATH Google Scholar
Bradley, J.R., Cressie, N., Shi, T., et al.: A comparison of spatial predictors when datasets could be very large. Statist. Surv. 10, 100–131 (2016)
MathSciNet MATH Google Scholar
Chung, J., Saibaba, A.K.: Generalized hybrid iterative methods for large-scale Bayesian inverse problems. SIAM J. Sci. Comput. 39(5), S24–S46 (2017)
MathSciNet MATH Google Scholar
Chung, J., Saibaba, A.K., Brown, M., Westman, E.: Efficient generalized Golub-Kahan based methods for dynamic inverse problems. Inverse Prob. 34(2), 024005 (2018)
MathSciNet MATH Google Scholar
Cox, D.R., Snell, E.J.: Analysis of Binary Data, vol. 32. CRC Press, Cambridge (1989)
MATH Google Scholar
Cressie, N., Johannesson, G.: Fixed rank Kriging for very large spatial data sets. J. R. Statist. Soc. Ser. B Statist. Methodol. 70(1), 209–226 (2008)
MathSciNet MATH Google Scholar
Datta, A., Banerjee, S., Finley, A.O., Gelfand, A.E.: Hierarchical nearest-neighbor Gaussian process models for large geostatistical datasets. J. Am. Stat. Assoc. 111(514), 800–812 (2016)
MathSciNet Google Scholar
Datta, A., Banerjee, S., Finley, A.O., Gelfand, A.E.: On nearest-neighbor Gaussian process models for massive spatial data. Wiley Interdiscip. Rev. Comput. Statist. 8(5), 162–171 (2016)
MathSciNet Google Scholar
Datta, A., Banerjee, S., Finley, A.O., Hamm, N.A., Schaap, M.: Nonseparable dynamic nearest neighbor Gaussian process models for large spatio-temporal data with an application to particulate matter analysis. Ann. Appl. Statist. 10(3), 1286 (2016)
MathSciNet MATH Google Scholar
Den Hertog, D., Kleijnen, J.P., Siem, A.Y.: The correct Kriging variance estimated by bootstrapping. J. Oper. Res. Soc. 57(4), 400–409 (2006)
MATH Google Scholar
Dutta, S., Mondal, D.: REML estimation with intrinsic Matérn dependence in the spatial linear mixed model. Electr. J. Statist. 10(2), 2856–2893 (2016)
MATH Google Scholar
Eidsvik, J., Shaby, B.A., Reich, B.J., Wheeler, M., Niemi, J.: Estimation and prediction in spatial models with block composite likelihoods. J. Comput. Graph. Stat. 23(2), 295–315 (2014)
MathSciNet Google Scholar
Eriksson, D., Dong, K., Lee, E., Bindel, D., Wilson, A.G.: Scaling Gaussian process regression with derivatives. In: Advances in Neural Information Processing Systems, pp. 6867–6877 (2018)
Finley, A.O., Sang, H., Banerjee, S., Gelfand, A.E.: Improving the performance of predictive process modeling for large datasets. Comput. Statist. Data Anal. 53(8), 2873–2884 (2009)
MathSciNet MATH Google Scholar
Fuentes, M.: Approximate likelihood for large irregularly spaced spatial data. J. Am. Stat. Assoc. 102(477), 321–331 (2007)
MathSciNet MATH Google Scholar
Furrer, R., Genton, M.G., Nychka, D.: Covariance tapering for interpolation of large spatial datasets. J. Comput. Graph. Stat. 15(3), 502–523 (2006)
MathSciNet Google Scholar
Gneiting, T., Ševčíková, H., Percival, D.B., Schlather, M., Jiang, Y.: Fast and exact simulation of large Gaussian lattice systems in ${\mathbb{R}}^2$: Exploring the limits. J. Comput. Graph. Stat. 15(3), 483–501 (2006)
Google Scholar
Graham, I.G., Kuo, F.Y., Nuyens, D., Scheichl, R., Sloan, I.H.: Analysis of circulant embedding methods for sampling stationary random fields. SIAM J. Numer. Anal. 56(3), 1871–1895 (2018)
MathSciNet MATH Google Scholar
Gray, R.M.: Toeplitz and circulant matrices: A review. Found. Trends® Commun. Inf. Theory 2(3), 155–239 (2006)
Guhaniyogi, R., Banerjee, S.: Meta-Kriging: Scalable Bayesian modeling and inference for massive spatial datasets. Technometrics 60(4), 430–444 (2018)
MathSciNet Google Scholar
Guinness, J.: Spectral density estimation for random fields via periodic embeddings. Biometrika 106(2), 267–286 (2019)
MathSciNet MATH Google Scholar
Guinness, J., Fuentes, M.: Circulant embedding of approximate covariances for inference from Gaussian data on large lattices. J. Comput. Graph. Stat. 26(1), 88–97 (2017)
MathSciNet Google Scholar
Gyires, B.: Eigenwerte verallgemeinerter Toeplitzschen matrizen. Publ. Math. Debrecen. 4, 171–179 (1956)
MathSciNet MATH Google Scholar
Heaton, M.J., Datta, A., Finley, A.O., Furrer, R., Guinness, J., Guhaniyogi, R., Gerber, F., Gramacy, R.B., Hammerling, D., Katzfuss, M., et al.: A case study competition among methods for analyzing large spatial data. J. Agric. Biol. Environ. Stat. 24(3), 398–425 (2019)
MathSciNet MATH Google Scholar
Higdon, D.: Space and space-time modeling using process convolutions. In: Quantitative methods for current environmental issues, Springer, pp. 37–56 (2002)
Kang, E.L., Cressie, N.: Bayesian inference for the spatial random effects model. J. Am. Stat. Assoc. 106(495), 972–983 (2011)
MathSciNet MATH Google Scholar
Katzfuss, M.: A multi-resolution approximation for massive spatial datasets. J. Am. Stat. Assoc. 112(517), 201–214 (2017)
MathSciNet Google Scholar
Katzfuss, M., Cressie, N.: Spatio-temporal smoothing and EM estimation for massive remote-sensing data sets. J. Time Ser. Anal. 32(4), 430–446 (2011)
MathSciNet MATH Google Scholar
Katzfuss, M., Hammerling, D.: Parallel inference for massive distributed spatial data using low-rank models. Stat. Comput. 27(2), 363–375 (2017)
MathSciNet MATH Google Scholar
Kaufman, C.G., Schervish, M.J., Nychka, D.W.: Covariance tapering for likelihood-based estimation in large spatial data sets. J. Am. Stat. Assoc. 103(484), 1545–1555 (2008)
MathSciNet MATH Google Scholar
Kent, J.T., Mardia, K.V.: Spectral and circulant approximations to the likelihood for stationary Gaussian random fields. J. Statist. Plan. Inference 50(3), 379–394 (1996)
MathSciNet MATH Google Scholar
Lindgren, F., Rue, H., Lindström, J.: An explicit link between Gaussian fields and Gaussian Markov random fields: the stochastic partial differential equation approach. J. R. Statist. Soc. Ser. B Statist. Methodol. 73(4), 423–498 (2011)
MathSciNet MATH Google Scholar
Liu, H., Ong, Y.S., Shen, X., Cai, J.: When Gaussian process meets big data: A review of scalable GPs. IEEE Trans. Neural Netw. Learn. Syst. (2020)
Martino, S., Rue, H.: Implementing approximate bayesian inference using integrated nested laplace approximation: A manual for the inla program. Department of Mathematical Sciences, NTNU, Norway (2009)
Matérn, B.: Spatial variation, volume 36 of. Lecture Notes in Statistics (1960)
Minden, V., Damle, A., Ho, K.L., Ying, L.: Fast spatial Gaussian process maximum likelihood estimation via skeletonization factorizations. Multiscale Model. Simul. 15(4), 1584–1611 (2017)
MathSciNet MATH Google Scholar
Nychka, D., Bandyopadhyay, S., Hammerling, D., Lindgren, F., Sain, S.: A multiresolution Gaussian process model for the analysis of large spatial datasets. J. Comput. Graph. Stat. 24(2), 579–599 (2015)
Paciorek, C.J., Lipshitz, B., Zhuo, W., Kaufman, C.G., Thomas, R.C., et al.: Parallelizing Gaussian Process Calculations in R. J. Statist. Softw. 63(i10), (2015)
Rue, H., Held, L.: Gaussian Markov random fields: theory and applications. CRC Press, Cambridge (2005)
MATH Google Scholar
Saad, Y.: Iterative methods for sparse linear systems, 2nd edn. Society for Industrial and Applied Mathematics, Philadelphia, PA (2003), https://doi.org/10.1137/1.9780898718003, https://doi-org.prox.lib.ncsu.edu/10.1137/1.9780898718003
Stein, M.L.: Fast and exact simulation of fractional Brownian surfaces. J. Comput. Graph. Stat. 11(3), 587–599 (2002)
MathSciNet Google Scholar
Stein, M.L.: Statistical properties of covariance tapers. J. Comput. Graph. Stat. 22(4), 866–885 (2013)
MathSciNet Google Scholar
Stein, M.L., Chi, Z., Welty, L.J.: Approximating likelihoods for large spatial data sets. J. R. Statist. Soc. Ser. B Statist. Methodol. 66(2), 275–296 (2004)
MathSciNet MATH Google Scholar
Sun, Y., Li, B., Genton, M.G.: Geostatistics for large datasets. In: Advances and challenges in space-time modelling of natural events, Springer, pp. 55–77 (2012)
Ubaru, S., Chen, J., Saad, Y.: Fast estimation of tr(f(${A}$)) via stochastic Lanczos quadrature. SIAM J. Matrix Anal. Appl. 38(4), 1075–1099 (2017)
MathSciNet MATH Google Scholar
Varin, C., Reid, N., Firth, D.: An overview of composite likelihood methods. Statistica Sinica pp 5–42 (2011)
Vecchia, A.V.: Estimation and model identification for continuous spatial processes. J. R. Stat. Soc. Ser. B Methodol. 50(2), 297–312 (1988)
MathSciNet Google Scholar
Wendland, H.: Piecewise polynomial, positive definite and compactly supported radial functions of minimal degree. Adv. Comput. Math. 4(1), 389–396 (1995)
MathSciNet MATH Google Scholar
Widom, H.: Asymptotic behavior of block Toeplitz matrices and determinants. Adv. Math. 13(3), 284–322 (1974)
MathSciNet MATH Google Scholar

Download references

Acknowledgements

The authors were partially supported by the National Science Foundation through the awards DMS-1845406 and DMS-1638521. The authors were also partially supported by the National Institute of Health through the awards R01ES031651-01 and R01ES027892 and by The King Abdullah University of Science and Technology grant 3800.2. We would like to thank them for their support.

Funding

Author information

Suman Majumder
Present address: Department of Nutrition, Harvard T.H. Chan School of Public Health, Boston, USA

Authors and Affiliations

Department of Statistics, North Carolina State University, Raleigh, USA
Suman Majumder & Brian J. Reich
Department of Statistics, University of Nebraska-Lincoln, Lincoln, USA
Yawen Guan
Department of Mathematics, North Carolina State University, Raleigh, USA
Arvind K. Saibaba

Authors

Suman Majumder
View author publications
You can also search for this author in PubMed Google Scholar
Yawen Guan
View author publications
You can also search for this author in PubMed Google Scholar
Brian J. Reich
View author publications
You can also search for this author in PubMed Google Scholar
Arvind K. Saibaba
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Suman Majumder.

Ethics declarations

Conflict of interest

The authors have no conflicts of interest to declare that are relevant to the content of this article.

Code availability

A GitHub repository has been set up that contains codes and a demonstration file for the methods described in the article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Gradient and Hessian computation for the optimization procedure

In this section, we present the necessary details of computing and approximating the gradient and Hessian for the optimization routine.

We first derive exact expressions for the gradient and then show how to approximate them using the strategy in Sects. 3.2 and 3.1. Computing the analytical gradient would require computing derivatives of ${\varvec{\Gamma } }= \left( \frac{1}{\sigma ^2}{\varvec{\Sigma }}(\varvec{\theta })^{-1} + \frac{1}{\tau ^2}{\mathbf {A}}^\mathsf {T}{\mathbf {A}}\right) ^{-1}$ and ${\widehat{{\mathbf {x}}}}(\varvec{\theta })$ with respect to each of $\mu , \sigma ^2, \tau ^2$ and $\rho $. For convenience, we reparameterize $1/\sigma ^2 = \lambda ^2$ and $1/\tau ^2 = \lambda _e^2$. Using the precision instead of variance brings about greater ease in computing the analytical derivatives. Under the new parameterization,

$$\begin{aligned} {\varvec{\Gamma } }=&\left( \lambda _e^2{\mathbf {A}}^\mathsf {T}{\mathbf {A}}+ \lambda ^2{\varvec{\Sigma }}^{-1} \right) ^{-1}, \end{aligned}$$

(15)

$$\begin{aligned} {\widehat{{\mathbf {x}}}}(\varvec{\theta })=&\varvec{{\Gamma }}\lambda _e^2{\mathbf {A}}^\mathsf {T}({\mathbf {y}}-{\mathbf {X}}\varvec{\beta }),\end{aligned}$$

(16)

and

$$\begin{aligned} \begin{aligned} \text{ pl }(\varvec{\theta })&\simeq \frac{p}{2}\log \lambda _e^2 -\frac{\lambda _e^2}{2}{\widehat{\varvec{\psi }}}(\varvec{\theta })^\mathsf {T}{\widehat{\varvec{\psi }}}(\varvec{\theta })\\ {}&\quad + \frac{n}{2}\log \lambda ^2 - \frac{1}{2}\log \det \,{\varvec{\Sigma }}(\varvec{\theta }) -\\ {}&\quad \frac{\lambda ^2}{2}{\hat{{\mathbf {x}}}}(\varvec{\theta })^\mathsf {T}{\varvec{\Sigma }}(\varvec{\theta })^{-1}{\hat{{\mathbf {x}}}}(\varvec{\theta }), \end{aligned}\end{aligned}$$

(17)

where ${\widehat{\varvec{\psi }}}(\varvec{\theta }) = {\mathbf {y}}- {\mathbf {X}}\varvec{\beta }- {\mathbf {A}}{\hat{{\mathbf {x}}}}(\varvec{\theta })$.

The derivatives for $\varvec{{\Gamma } }$ are computed to be

$$\begin{aligned} \begin{aligned} \frac{\partial {\varvec{\Gamma } }}{\partial \varvec{\beta }}&= {\mathbf {0}}, \\ \frac{\partial {\varvec{\Gamma } }}{\partial \lambda ^2}&= - {\varvec{\Gamma }}{\varvec{\Sigma }}(\rho )^{-1}{\varvec{\Gamma } },\\ \frac{\partial {\varvec{\Gamma } }}{\partial \rho }&= \lambda ^2 {\varvec{\Gamma }}{\varvec{\Sigma }}(\rho )^{-1}\left( \text{ d }{\varvec{\Sigma }}(\rho ) \right) {\varvec{\Sigma }}(\rho )^{-1} {\varvec{\Gamma } }, \\ \frac{\partial {\varvec{\Gamma } }}{\partial \lambda _e^2}&= - {\varvec{\Gamma } }{\mathbf {A}}^\mathsf {T}{\mathbf {A}}{\varvec{\Gamma }}, \end{aligned}\end{aligned}$$

(18)

where $\text{ d }{\varvec{\Sigma }}(\rho )$ denotes the derivative of ${\varvec{\Sigma }}(\rho )$ with respect to $\rho $. This is easy to compute analytically and has the nice BTTB property that ${\varvec{\Sigma }}(\rho )$ has.

Using the expressions in (18), we compute the derivatives of ${\widehat{{\mathbf {x}}}}(\varvec{\theta })$ to be

$$\begin{aligned} \begin{aligned} \frac{\partial {\widehat{{\mathbf {x}}}}(\varvec{\theta })}{\partial \varvec{\beta }}&= -\lambda _e^2 {\varvec{\Gamma } }{\mathbf {A}}^\mathsf {T}{\mathbf {X}},\\ \frac{\partial {\widehat{{\mathbf {x}}}}(\varvec{\theta })}{\partial \lambda ^2}&= -{\varvec{\Gamma } }{\varvec{\Sigma }}(\rho )^{-1} {\widehat{{\mathbf {x}}}}(\varvec{\theta }),\\ \frac{\partial {\widehat{{\mathbf {x}}}}(\varvec{\theta })}{\partial \rho }&= \lambda ^2 {\varvec{\Gamma } }{\varvec{\Sigma }}(\rho )^{-1} \left( \text{ d } {\varvec{\Sigma }}(\rho )\right) {\varvec{\Sigma }}(\rho )^{-1}{\widehat{{\mathbf {x}}}}(\varvec{\theta }),\\ \frac{\partial {\widehat{{\mathbf {x}}}}(\varvec{\theta })}{\partial \lambda _e^2}&= {\varvec{\Gamma } }{\mathbf {A}}^\mathsf {T}{\widehat{\varvec{\psi }}}(\varvec{\theta }) . \end{aligned} \end{aligned}$$

(19)

Substituting the expressions for analytical derivatives of $\varvec{\Gamma }$ and ${\widehat{{\mathbf {x}}}}(\varvec{\theta })$ in the expression for the analytical gradient, we have it computed to be

$$\begin{aligned} \begin{aligned} \frac{\partial \text{ pl }}{\partial \varvec{\beta }}&= \> \lambda _e^2{\mathbf {X}}^\mathsf {T}{\widehat{\varvec{\psi }}}(\varvec{\theta }),\\ \frac{\partial \text{ pl }}{\partial \lambda ^2}&= \>\frac{n}{2\lambda ^2} - \frac{1}{2}{\widehat{{\mathbf {x}}}}(\varvec{\theta })^\mathsf {T}{\varvec{\Sigma }}(\rho )^{-1}{\widehat{{\mathbf {x}}}}(\varvec{\theta })\\ \frac{\partial \text{ pl }}{\partial \rho }&= -\frac{1}{2}\text{ dL } + \frac{1}{2}\lambda ^2 {\widehat{{\mathbf {x}}}}(\varvec{\theta })^\mathsf {T}{\varvec{\Sigma }}(\rho )^{-1}\left( \text{ d }{\varvec{\Sigma }}(\rho ) \right) {\varvec{\Sigma }}(\rho )^{-1}{\widehat{{\mathbf {x}}}}(\varvec{\theta }),\\ \frac{\partial \text{ pl }}{\partial \lambda _e^2}&= \> \frac{p}{2\lambda _e^2} - \frac{1}{2}{\widehat{\varvec{\psi }}}(\varvec{\theta })^\mathsf {T}{\widehat{\varvec{\psi }}}(\varvec{\theta }), \end{aligned}\end{aligned}$$

(20)

where $\text {dL}$ is the derivative of $\log \det \,{\varvec{\Sigma }}(\rho )$ with respect to $\rho $.

We approximate the gradient expressions in (20) by approximating ${\widehat{{\mathbf {x}}}}(\varvec{\theta })$ by ${\mathbf {x}}_k^*(\varvec{\theta })$ as in (12) and using the exact arithmetic identities expressed in (10). The approximated gradients can be computed as

$$\begin{aligned} \begin{aligned} \frac{\partial \text{ pl }}{\partial \varvec{\beta }}&\approx \lambda _e^2 {\mathbf {X}}^\mathsf {T}\varvec{\psi }^*_k(\varvec{\theta }),\\ \frac{\partial \text{ pl }}{\partial \lambda ^2}&\approx \frac{n}{2\lambda ^2} - \frac{1}{2}\Vert {\mathbf {z}}_k\Vert _2^2,\\ \frac{\partial \text{ pl }}{\partial \rho }&\approx -\frac{1}{2}\widehat{\text{ dL }} + \frac{\lambda ^2}{2}{\mathbf {z}}_k^\mathsf {T}{\mathbf {V}}_k^\mathsf {T}\left( \text{ d }{\varvec{\Sigma }}(\rho ) \right) {\mathbf {V}}_k{\mathbf {z}}_k,\\ \frac{\partial \text{ pl }}{\partial \lambda _e^2}&\approx \frac{p}{2\lambda _e^2} - \frac{1}{2}\varvec{\psi }^*_k(\varvec{\theta })^\mathsf {T}\varvec{\psi }^*_k(\varvec{\theta }), \end{aligned}\end{aligned}$$

(21)

where $\varvec{\psi }^*_k(\varvec{\theta }) = {\mathbf {y}}- {\mathbf {X}}\varvec{\beta }- {\mathbf {A}}{\mathbf {x}}^*_k(\varvec{\theta })$ and ${\mathbf {V}}_k, {\mathbf {z}}_k$ have been defined in Sect. 3.2.

$\widehat{\text {dL}}$ is an approximation to $\text {dL}$, the derivative of the log-determinant of ${\varvec{\Sigma }}(\rho )$ with respect to $\rho $. The analytical expression for $\text {dL}$ turns out to be

$$\begin{aligned} \text{ dL } = \text{ trace }\left( {\varvec{\Sigma }}(\rho )^{-1}\text{ d }{\varvec{\Sigma }}(\rho ).\right) \end{aligned}$$

This is infeasible to compute directly and is therefore approximated using the BTTB structure of ${\varvec{\Sigma }}(\rho )$ and $\text {d}{\varvec{\Sigma }}(\rho )$.

Any symmetric matrix with BTTB structure can be extended to have a BCCB structure as was done in computing the log-determinant itself and one can extract the eigenvalues of the matrix with BTTB structure using the matrix with BCCB structure. Any BCCB matrix is diagonalizable as ${\mathbf {F}}{\mathbf {D}}{\mathbf {F}}^\mathsf{T}$, where ${\mathbf {F}}$ is a scaled matrix consisting of d-dimensional (d=2, in our case) Fourier coefficients, irrespective of the BCCB matrix being diagonalized. Therefore, we can say

$$\begin{aligned} \begin{aligned} {\varvec{\Sigma }}(\rho )&= {\mathbf {F}}{\mathbf {D}}_1 {\mathbf {F}}^\mathsf {T},\\ {\varvec{\Sigma }}(\rho )^{-1}&= {\mathbf {F}}{\mathbf {D}}_1^{-1} {\mathbf {F}}^\mathsf {T},\\ \text{ d }{\varvec{\Sigma }}(\rho )&= {\mathbf {F}}{\mathbf {D}}_2 {\mathbf {F}}^\mathsf {T}. \end{aligned} \end{aligned}$$

(22)

These imply that

$$\begin{aligned} \begin{aligned} \text{ trace }\left( {\varvec{\Sigma }}(\rho )^{-1}\text{ d }{\varvec{\Sigma }}(\rho ) \right)&= \text{ trace }\left( {\mathbf {F}}{\mathbf {D}}_1^{-1}{\mathbf {F}}^\mathsf {T}{\mathbf {F}}{\mathbf {D}}_2 {\mathbf {F}}^\mathsf {T} \right) \\ {}&\quad = \text{ trace }\left( {\mathbf {D}}_1^{-1}{\mathbf {D}}_2 \right) . \end{aligned}\end{aligned}$$

(23)

Since both ${\mathbf {D}}_1$ and ${\mathbf {D}}_2$ are diagonal, approximating $\text {dL}$ boils down to computing ${\mathbf {D}}_1$ and ${\mathbf {D}}_2$, which can be computed by d-dimensional FFT of the corresponding first circulant block structures of the extended BCCB structure and subsetting it properly. The equivalence in computing the derivative of log-determinant of the BTTB and matrix and its corresponding BCCB matrix has been demonstrated by Kent and Mardia (1996), showing the approximation to have the same error rate as in approximating the log-determinant itself. Approximating the derivative of the log-determinant term also costs the same as approximating the log-determinant itself, ${\mathcal {O}}(n \log n)$.

While minimizing the negative log-likelihood function, the Hessian turns out to be simply the Information matrix ${\mathbf {I}}(\varvec{\theta })$. While

$$\begin{aligned} {\mathbb {E}}\left( -\nabla _2 \text {pl}(\varvec{\theta }) \right) = {\mathbf {I}}(\varvec{\theta }), \end{aligned}$$

we also have

$$\begin{aligned} {\mathbb {E}}\left( \nabla \text {pl}(\varvec{\theta }) \nabla \text {pl}(\varvec{\theta })^\mathsf{T} \right) = {\mathbb {E}}\left[ \left( -\nabla \text {pl}(\varvec{\theta })\right) \left( -\nabla \text {pl}(\varvec{\theta })\right) ^\mathsf{T} \right] . \end{aligned}$$

Here the expectations are computed with respect to ${\mathbf {y}}$ and $\nabla $, $\nabla _2$ represent the gradient and Hessian created by computing first- and second-order partial derivatives with respect to $\varvec{\theta }$. Therefore, the outer product of the gradient with itself serves as a rank-one estimate for the Hessian for a likelihood optimization problem. Although we are using profile likelihood instead of the actual likelihood function, the approximation still stands in an asymptotic sense since both the actual likelihood estimator and the profile likelihood estimators have the same asymptotic properties. This prompts us to take the outer product of the approximated gradient with itself as a rank-one approximation to the Hessian.

However, we compute the unique entries of the exact Hessian to be

$$\begin{aligned} \begin{aligned} \frac{\partial ^2 \text{ pl }}{\partial \varvec{\beta }\partial \varvec{\beta }^\mathsf {T}}&= -\lambda _e^2{\mathbf {X}}^\mathsf {T}{\mathbf {X}}+ \lambda _e^4{\mathbf {X}}^\mathsf {T}{\mathbf {A}}{\varvec{\Gamma } }{\mathbf {A}}^\mathsf {T}{\mathbf {X}}\\ \frac{\partial ^2 \text{ pl }}{\partial \varvec{\beta }\partial \lambda ^2}&= \lambda _e^2{\mathbf {X}}^\mathsf {T}{\mathbf {A}}{\varvec{\Gamma } }{\varvec{\Sigma }}(\rho )^{-1}{\widehat{{\mathbf {x}}}}(\varvec{\theta })\\ \frac{\partial ^2 \text{ pl }}{\partial \varvec{\beta }\partial \rho }&= -\lambda _e^2\lambda ^2{\mathbf {X}}^\mathsf {T}{\mathbf {A}}{\varvec{\Gamma } }{\varvec{\Sigma }}(\rho )^{-1}\left( \text{ d }{\varvec{\Sigma }}(\rho ) \right) {\varvec{\Sigma }}(\rho )^{-1}{\widehat{{\mathbf {x}}}}(\varvec{\theta })\\ \frac{\partial ^2 \text{ pl }}{\partial \varvec{\beta }\partial \lambda _e^2}&= {\mathbf {X}}^\mathsf {T}{\widehat{\varvec{\psi }}}(\varvec{\theta }) - \lambda _e^2{\mathbf {X}}^\mathsf {T}{\mathbf {A}}{\varvec{\Gamma } }{\mathbf {A}}^\mathsf {T}{\widehat{\varvec{\psi }}}(\varvec{\theta })\\ \frac{\partial ^2 \text{ pl }}{\partial \lambda ^4}&= -\frac{n}{2\lambda ^4} + {\widehat{{\mathbf {x}}}}(\varvec{\theta })^\mathsf {T}{\varvec{\Sigma }}(\rho )^{-1}{\varvec{\Gamma } }{\varvec{\Sigma }}(\rho )^{-1}{\widehat{{\mathbf {x}}}}({\theta })\\ \frac{\partial ^2 \text{ pl }}{\partial \lambda ^2 \partial \rho }&= \frac{1}{2}{\widehat{{\mathbf {x}}}}(\varvec{\theta })^\mathsf {T}{\varvec{\Sigma }}(\rho )^{-1}\left( \text{ d }{\varvec{\Sigma }}(\rho ) \right) {\varvec{\Sigma }}(\rho )^{-1}{\widehat{{\mathbf {x}}}}(\varvec{\theta })\\ {}&\quad - \lambda ^2{\widehat{{\mathbf {x}}}}(\varvec{\theta })^\mathsf {T}{\varvec{\Sigma }}(\rho )^{-1}\left( \text{ d }{\varvec{\Sigma }}(\rho ) \right) \\ {}&\quad \times {\varvec{\Sigma }}(\rho )^{-1}{\varvec{\Gamma }}{\varvec{\Sigma }}(\rho )^{-1}{\widehat{{\mathbf {x}}}}(\varvec{\theta })\\ \frac{\partial ^2 \text{ pl }}{\partial \lambda ^2 \partial \lambda _e^2}&= -{\widehat{{\mathbf {x}}}}(\varvec{\theta })^\mathsf {T}{\varvec{\Sigma }}(\rho )^{-1}{\varvec{\Gamma } }{\mathbf {A}}^\mathsf {T}{\widehat{\varvec{\psi }}}(\varvec{\theta })\\ \frac{\partial ^2 \text{ pl }}{\partial \rho ^2}&= -\frac{1}{2}\text{ d}^2\text{ L } + \frac{\lambda ^2}{2}{\widehat{{\mathbf {x}}}}(\varvec{\theta })\\ {}&\quad \times {\varvec{\Sigma }}(\rho )^{-1}\left( \text{ d}^2{\varvec{\Sigma }}(\rho ) \right) {\varvec{\Sigma }}(\rho )^{-1}{\widehat{{\mathbf {x}}}}(\varvec{\theta })\\ {}&\quad - \lambda ^2{\widehat{{\mathbf {x}}}}(\varvec{\theta })^\mathsf {T}{\varvec{\Sigma }}(\rho )^{-1}\left( \text{ d }{\varvec{\Sigma }}(\rho ) \right) \left[ {\varvec{\Sigma }}(\rho )^{-1}\right. \\ {}&\qquad \left. - {\varvec{\Sigma }}(\rho )^{-1}{\varvec{\Gamma } }{\varvec{\Sigma }}(\rho )^{-1} \right] \left( \text{ d }{\varvec{\Sigma }}(\rho ) \right) {\varvec{\Sigma }}(\rho )^{-1}{\widehat{{\mathbf {x}}}}(\varvec{\theta })\\ \frac{\partial ^2 \text{ pl }}{\partial \rho \partial \lambda _e^2}&= \lambda ^2{\widehat{{\mathbf {x}}}}(\varvec{\theta })^\mathsf {T}{\varvec{\Sigma }}(\rho )^{-1}\left( \text{ d }{\varvec{\Sigma }}(\rho ) \right) {\varvec{\Sigma }}(\rho )^{-1}{\varvec{\Gamma } }{\mathbf {A}}^\mathsf {T}{\widehat{\varvec{\psi }}}(\varvec{\theta })\\ \frac{\partial ^2 \text{ pl }}{\partial \lambda _e^4}&= -\frac{p}{\lambda _e^4} + {\widehat{\varvec{\psi }}}(\varvec{\theta })^\mathsf {T}{\mathbf {A}}{\varvec{\Gamma } }{\mathbf {A}}^\mathsf {T}{\widehat{\varvec{\psi }}}(\varvec{\theta }), \end{aligned}\end{aligned}$$

(24)

where $\text {d}^2\text {L}$ represents the second derivative of $\log \det \,{\varvec{\Sigma }}(\rho )$ with respect to $\rho $ and $\text {d}^2{\varvec{\Sigma }}(\rho )$ is the second derivative of ${\varvec{\Sigma }}(\rho )$ with respect to $\rho $. $\text {d}^2{\varvec{\Sigma }}(\rho )$ also has a BTTB structure as $\varvec{{\Sigma }}(\rho )$ and $\text {d}{\varvec{\Sigma }}(\rho )$.

These entries are then approximated using the approximation to ${\varvec{\Gamma }}$ as presented in Chung et al. (2018), namely

$$\begin{aligned} {\varvec{\Gamma } }\approx \lambda ^{-2}\left( {\varvec{\Sigma }}(\rho ) - {\mathbf {Z}}_k{\varvec{\varDelta }}_k{\mathbf {Z}}_k^\mathsf {T} \right) ,\end{aligned}$$

(25)

where ${\mathbf {Z}}_k = {\varvec{\Sigma }}(\rho ){\mathbf {V}}_k{\mathbf {W}}_k$ with ${\mathbf {B}}_k^\mathsf{T}{\mathbf {B}}_k = {\mathbf {W}}_k\varvec{\varTheta }_k{\mathbf {W}}_k$ and $\varvec{\varDelta }_k = \left( {\mathbf {I}}+ \lambda ^{-2}\varvec{\varTheta }_k \right) ^{-1}$.

Table 6 RMSE in estimating the parameters for SPDE and Kryging for different grid sizes and choices of k as in the first simulation study. The true values for the parameters were (44.49, 3, 0.5, 1). The figures in brackets indicate standard error

Full size table

We define ${\mathbf {z}}_0 = {\mathbf {y}}- {\mathbf {X}}\varvec{\beta }- {\mathbf {A}}{\mathbf {x}}^*_k(\varvec{\theta }) = \varvec{\psi }^*_k(\varvec{\theta })$. The approximated entries of the Hessian are

$$\begin{aligned} \frac{\partial ^2 \text {pl}}{\partial \varvec{\beta }\partial \varvec{\beta }^\mathsf{T}}&\approx -\lambda _e^2{\mathbf {X}}^\mathsf{T}{\mathbf {X}}+ \frac{\lambda _e^4}{\lambda ^2}{\mathbf {X}}^\mathsf{T}{\mathbf {A}}\varvec{{\Sigma }}(\rho ){\mathbf {A}}^\mathsf{T}{\mathbf {X}}\nonumber \\&\quad - \frac{\lambda _e^4}{\lambda ^2}{\mathbf {X}}^\mathsf{T}{\mathbf {U}}_k{\mathbf {B}}_k{\mathbf {W}}_k\varvec{\varDelta }_k{\mathbf {W}}_k^\mathsf{T}{\mathbf {B}}_k^\mathsf{T}{\mathbf {U}}_k^\mathsf{T}{\mathbf {X}}\nonumber \\ \frac{\partial ^2 \text {pl}}{\partial \varvec{\beta }\partial \lambda ^2}&\approx \frac{\lambda _e^2}{\lambda ^2}{\mathbf {X}}^\mathsf{T}{\mathbf {A}}{\widehat{x}}^*(\varvec{\theta }) - \frac{\lambda _e^2}{\lambda ^2}{\mathbf {X}}^\mathsf{T}{\mathbf {U}}_k{\mathbf {B}}_k{\mathbf {W}}_k\varvec{\varDelta }_k{\mathbf {W}}_k^\mathsf{T}{\mathbf {z}}_k \nonumber \\ \frac{\partial ^2 \text {pl}}{\partial \varvec{\beta }\partial \rho }&\approx -\lambda _e^2{\mathbf {X}}^\mathsf{T}{\mathbf {A}}\left( \text {d}\varvec{{\Sigma }}(\rho ) \right) {\mathbf {V}}_k{\mathbf {z}}_k \nonumber \\&\quad + \lambda _e^2{\mathbf {X}}^\mathsf{T}{\mathbf {U}}_k{\mathbf {B}}_k{\mathbf {W}}_k\varvec{\varDelta }_k{\mathbf {W}}_k^\mathsf{T}{\mathbf {V}}_k^\mathsf{T}\left( \text {d}\varvec{{\Sigma }}(\rho ) \right) {\mathbf {V}}_k{\mathbf {z}}_k \nonumber \\ \frac{\partial ^2 \text {pl}}{\partial \varvec{\beta }\partial \lambda _e^2}&\approx -{\mathbf {X}}^\mathsf{T}{\mathbf {z}}_0 - \frac{\lambda _e^2}{\lambda ^2}{\mathbf {X}}^\mathsf{T}{\mathbf {A}}\varvec{{\Sigma }}(\rho ){\mathbf {A}}^\mathsf{T}{\mathbf {z}}_0 \nonumber \\&\quad + \frac{\lambda _e^2}{\lambda ^2}{\mathbf {X}}^\mathsf{T}{\mathbf {U}}_k{\mathbf {B}}_k{\mathbf {W}}_k\varvec{\varDelta }_k{\mathbf {W}}_k^\mathsf{T}{\mathbf {B}}_k^\mathsf{T}{\mathbf {U}}_k^\mathsf{T}{\mathbf {z}}_0 \nonumber \\ \frac{\partial ^2 \text {pl}}{\partial \lambda ^4}&\approx -\frac{n}{2\lambda ^4} + \frac{1}{\lambda ^2}\Vert {\mathbf {z}}_k\Vert _2^2 - \frac{1}{\lambda ^2}{\mathbf {z}}_k^\mathsf{T}{\mathbf {W}}_k\varvec{\varDelta }_k{\mathbf {W}}_k^\mathsf{T}{\mathbf {z}}_k \nonumber \\ \frac{\partial ^2 \text {pl}}{\partial \lambda ^2 \partial \rho }&\approx -\frac{1}{2}{\mathbf {z}}_k^\mathsf{T}{\mathbf {V}}_k^\mathsf{T}\left( \text {d}\varvec{{\Sigma }}(\rho ) \right) {\mathbf {V}}_k{\mathbf {z}}_k \nonumber \\&\quad + {\mathbf {z}}_k^\mathsf{T}{\mathbf {V}}_k^\mathsf{T}\left( \text {d}\varvec{{\Sigma }}(\rho ) \right) {\mathbf {V}}_k{\mathbf {W}}_k\varvec{\varDelta }_k{\mathbf {W}}_k^\mathsf{T}{\mathbf {z}}_k \nonumber \\ \frac{\partial ^2 \text {pl}}{\partial \lambda ^2 \partial \lambda _e^2}&\approx -\frac{1}{\lambda ^2}{\mathbf {z}}_k^\mathsf{T}{\mathbf {B}}_k^\mathsf{T}{\mathbf {U}}_k^\mathsf{T}{\mathbf {z}}_0 + \frac{1}{\lambda ^2}{\mathbf {z}}_k^\mathsf{T}{\mathbf {W}}_k^\mathsf{T}\varvec{\varDelta }_k{\mathbf {W}}_k^\mathsf{T}{\mathbf {B}}_k^\mathsf{T}{\mathbf {U}}_k^\mathsf{T}{\mathbf {z}}_0 \nonumber \\ \frac{\partial ^2 \text {pl}}{\partial \rho ^2}&\approx -\frac{1}{2}\widehat{\text {d}^2\text {L}} + \frac{\lambda ^2}{2}{\mathbf {z}}_k^\mathsf{T}{\mathbf {V}}_k^\mathsf{T}\left( \text {d}^2\varvec{{\Sigma }}(\rho ) \right) {\mathbf {V}}_k{\mathbf {z}}_k \nonumber \\&\quad - \lambda ^2 {\mathbf {z}}_k^\mathsf{T}{\mathbf {V}}_k^\mathsf{T}\left( \text {d}\varvec{{\Sigma }}(\rho ) \right) {\mathbf {V}}_k{\mathbf {W}}_k\varvec{\varDelta }_k \nonumber \\&\quad \times {\mathbf {W}}_k^\mathsf{T}{\mathbf {V}}_k^\mathsf{T}\left( \text {d}\varvec{{\Sigma }}(\rho ) \right) {\mathbf {V}}_k{\mathbf {z}}_k \nonumber \\ \frac{\partial ^2 \text {pl}}{\partial \rho \partial \lambda _e^2}&\approx {\mathbf {z}}_k^\mathsf{T}{\mathbf {V}}_k^\mathsf{T}\left( \text {d}\varvec{{\Sigma }}(\rho ) \right) {\mathbf {A}}^\mathsf{T}{\mathbf {z}}_0 \nonumber \\&\quad - {\mathbf {z}}_k^\mathsf{T}{\mathbf {V}}_k^\mathsf{T}\left( \text {d}\varvec{{\Sigma }}(\rho ) \right) {\mathbf {V}}_k{\mathbf {W}}_k\varvec{\varDelta }_k{\mathbf {W}}_k^\mathsf{T}{\mathbf {B}}_k^\mathsf{T}{\mathbf {U}}_k^\mathsf{T}{\mathbf {z}}_0 \nonumber \\ \frac{\partial ^2 \text {pl}}{\partial \lambda _e^4}&\approx -\frac{p}{2\lambda _e^4} + \frac{1}{\lambda ^2}{\mathbf {z}}_0^\mathsf{T}{\mathbf {A}}\varvec{{\Sigma }}(\rho ){\mathbf {A}}^\mathsf{T}{\mathbf {z}}_0 \nonumber \\&\quad - \frac{1}{\lambda ^2}{\mathbf {z}}_0^\mathsf{T}{\mathbf {U}}_k{\mathbf {B}}_k{\mathbf {W}}_k\varvec{\varDelta }_k{\mathbf {W}}_k^\mathsf{T}{\mathbf {B}}_k^\mathsf{T}{\mathbf {U}}_k^\mathsf{T}{\mathbf {z}}_0, \end{aligned}$$

(26)

where $\widehat{\text {d}^2\text {L}}$ is a numerical approximation to $\text {d}^2\text {L}$. We do not use this approximation for our computing, but hope to use it in future.

Additional tables from the simulation study

In this section, we provide additional results for the simulation study. Table 6 evaluates parameter estimations for the first simulation study for both SPDE and Kryging methods. The same is done in Tables 7 and 8 for the second and third simulation studies. The results across the board are similar as mentioned in Section 4. SPDE performs better in estimating the nugget parameter $\tau ^2$, while Kryging performs better in estimating the partial sill parameter $\sigma ^2$. Both methods do equally well in estimating the mean parameter $\beta $ and the spatial range parameter $\rho $.

Table 7 RMSE in estimating the parameters for SPDE and Kryging under different parametric settings and different choices of k as in the second simulation study. The true values for the parameters were (44.49, 3, 0.5, 0.05), (44.49, 3, 0.5, 0.2), (44.49, 1.5, 0.5, 0.1) and (44.49, 6, 0.5, 0.1) for settings 1 through 4, respectively. The figures in brackets indicate standard error

Full size table

Table 8 RMSE in estimating the parameters for SPDE and Kryging for different choices of underlying grid size and k for the simulation study with irregularly spaced data. The true parameter values were (44.49, 3, 0.5, 0.1). The figures in brackets indicate standard error

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Majumder, S., Guan, Y., Reich, B.J. et al. Kryging: geostatistical analysis of large-scale datasets using Krylov subspace methods. Stat Comput 32, 74 (2022). https://doi.org/10.1007/s11222-022-10104-3

Download citation

Received: 15 June 2021
Accepted: 07 May 2022
Published: 08 September 2022
DOI: https://doi.org/10.1007/s11222-022-10104-3

Kryging: geostatistical analysis of large-scale datasets using Krylov subspace methods

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Reduced Basis Kriging for Big Spatial Fields

Spatial Autocorrelation and Spatial Filtering

Generalizations of Simple Kriging Methods in Spatial Data Analysis

Availability of Data and Material

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Code availability

Additional information

Publisher's Note

Appendices

Gradient and Hessian computation for the optimization procedure

Additional tables from the simulation study

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Kryging: geostatistical analysis of large-scale datasets using Krylov subspace methods

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Reduced Basis Kriging for Big Spatial Fields

Spatial Autocorrelation and Spatial Filtering

Generalizations of Simple Kriging Methods in Spatial Data Analysis

Explore related subjects

Availability of Data and Material

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Code availability

Additional information

Publisher's Note

Appendices

Gradient and Hessian computation for the optimization procedure

Additional tables from the simulation study

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation