Skip to main content
Log in

Kryging: geostatistical analysis of large-scale datasets using Krylov subspace methods

  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

Analyzing massive spatial datasets using a Gaussian process model poses computational challenges. This is a problem prevailing heavily in applications such as environmental modeling, ecology, forestry and environmental health. We present a novel approximate inference methodology that uses profile likelihood and Krylov subspace methods to estimate the spatial covariance parameters and makes spatial predictions with uncertainty quantification for point-referenced spatial data. “Kryging” combines Kriging and Krylov subspace methods and applies for both observations on regular grid and irregularly spaced observations, and for any Gaussian process with a stationary isotropic (and certain geometrically anisotropic) covariance function, including the popular Matérn  covariance family. We make use of the block Toeplitz structure with Toeplitz blocks of the covariance matrix and use fast Fourier transform methods to bypass the computational and memory bottlenecks of approximating log-determinant and matrix-vector products. We perform extensive simulation studies to show the effectiveness of our model by varying sample sizes, spatial parameter values and sampling designs. A real data application is also performed on a dataset consisting of land surface temperature readings taken by the MODIS satellite. Compared to existing methods, the proposed method performs satisfactorily with much less computation time and better scalability.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Availability of Data and Material

The dataset analyzed in Sect. 5 is available in the GitHub repository for the Heaton et al. (2019) project at this GitHub repository.

References

  • Ambikasaran, S., Foreman-Mackey, D., Greengard, L., Hogg, D.W., O’Neil, M.: Fast direct methods for Gaussian processes. IEEE Trans. Pattern Anal. Mach. Intell. 38(2), 252–265 (2015)

    Google Scholar 

  • Anitescu, M., Chen, J., Wang, L.: A matrix-free approach for solving the parametric Gaussian process maximum likelihood problem. SIAM J. Sci. Comput. 34(1), A240–A262 (2012)

    MathSciNet  MATH  Google Scholar 

  • Aune, E., Simpson, D.P., Eidsvik, J.: Parameter estimation in high dimensional gaussian distributions. Stat. Comput. 24(2), 247–263 (2014)

    MathSciNet  MATH  Google Scholar 

  • Banerjee, S., Gelfand, A.E., Finley, A.O., Sang, H.: Gaussian predictive process models for large spatial data sets. J. R. Statist. Soc. Ser. B Statist. Methodol. 70(4), 825–848 (2008)

    MathSciNet  MATH  Google Scholar 

  • Barbian, M.H., Assunção, R.M.: Spatial subsemble estimator for large geostatistical data. Spat. Statist. 22, 68–88 (2017)

    MathSciNet  Google Scholar 

  • Benbow, S.J.: Solving generalized least-squares problems with LSQR. SIAM J. Matrix Anal. Appl. 21(1), 166–177 (1999)

    MathSciNet  MATH  Google Scholar 

  • Bradley, J.R., Cressie, N., Shi, T., et al.: A comparison of spatial predictors when datasets could be very large. Statist. Surv. 10, 100–131 (2016)

    MathSciNet  MATH  Google Scholar 

  • Chung, J., Saibaba, A.K.: Generalized hybrid iterative methods for large-scale Bayesian inverse problems. SIAM J. Sci. Comput. 39(5), S24–S46 (2017)

    MathSciNet  MATH  Google Scholar 

  • Chung, J., Saibaba, A.K., Brown, M., Westman, E.: Efficient generalized Golub-Kahan based methods for dynamic inverse problems. Inverse Prob. 34(2), 024005 (2018)

    MathSciNet  MATH  Google Scholar 

  • Cox, D.R., Snell, E.J.: Analysis of Binary Data, vol. 32. CRC Press, Cambridge (1989)

    MATH  Google Scholar 

  • Cressie, N., Johannesson, G.: Fixed rank Kriging for very large spatial data sets. J. R. Statist. Soc. Ser. B Statist. Methodol. 70(1), 209–226 (2008)

    MathSciNet  MATH  Google Scholar 

  • Datta, A., Banerjee, S., Finley, A.O., Gelfand, A.E.: Hierarchical nearest-neighbor Gaussian process models for large geostatistical datasets. J. Am. Stat. Assoc. 111(514), 800–812 (2016)

    MathSciNet  Google Scholar 

  • Datta, A., Banerjee, S., Finley, A.O., Gelfand, A.E.: On nearest-neighbor Gaussian process models for massive spatial data. Wiley Interdiscip. Rev. Comput. Statist. 8(5), 162–171 (2016)

    MathSciNet  Google Scholar 

  • Datta, A., Banerjee, S., Finley, A.O., Hamm, N.A., Schaap, M.: Nonseparable dynamic nearest neighbor Gaussian process models for large spatio-temporal data with an application to particulate matter analysis. Ann. Appl. Statist. 10(3), 1286 (2016)

    MathSciNet  MATH  Google Scholar 

  • Den Hertog, D., Kleijnen, J.P., Siem, A.Y.: The correct Kriging variance estimated by bootstrapping. J. Oper. Res. Soc. 57(4), 400–409 (2006)

    MATH  Google Scholar 

  • Dutta, S., Mondal, D.: REML estimation with intrinsic Matérn dependence in the spatial linear mixed model. Electr. J. Statist. 10(2), 2856–2893 (2016)

    MATH  Google Scholar 

  • Eidsvik, J., Shaby, B.A., Reich, B.J., Wheeler, M., Niemi, J.: Estimation and prediction in spatial models with block composite likelihoods. J. Comput. Graph. Stat. 23(2), 295–315 (2014)

    MathSciNet  Google Scholar 

  • Eriksson, D., Dong, K., Lee, E., Bindel, D., Wilson, A.G.: Scaling Gaussian process regression with derivatives. In: Advances in Neural Information Processing Systems, pp. 6867–6877 (2018)

  • Finley, A.O., Sang, H., Banerjee, S., Gelfand, A.E.: Improving the performance of predictive process modeling for large datasets. Comput. Statist. Data Anal. 53(8), 2873–2884 (2009)

    MathSciNet  MATH  Google Scholar 

  • Fuentes, M.: Approximate likelihood for large irregularly spaced spatial data. J. Am. Stat. Assoc. 102(477), 321–331 (2007)

    MathSciNet  MATH  Google Scholar 

  • Furrer, R., Genton, M.G., Nychka, D.: Covariance tapering for interpolation of large spatial datasets. J. Comput. Graph. Stat. 15(3), 502–523 (2006)

    MathSciNet  Google Scholar 

  • Gneiting, T., Ševčíková, H., Percival, D.B., Schlather, M., Jiang, Y.: Fast and exact simulation of large Gaussian lattice systems in \({\mathbb{R}}^2\): Exploring the limits. J. Comput. Graph. Stat. 15(3), 483–501 (2006)

    Google Scholar 

  • Graham, I.G., Kuo, F.Y., Nuyens, D., Scheichl, R., Sloan, I.H.: Analysis of circulant embedding methods for sampling stationary random fields. SIAM J. Numer. Anal. 56(3), 1871–1895 (2018)

    MathSciNet  MATH  Google Scholar 

  • Gray, R.M.: Toeplitz and circulant matrices: A review. Found. Trends® Commun. Inf. Theory 2(3), 155–239 (2006)

  • Guhaniyogi, R., Banerjee, S.: Meta-Kriging: Scalable Bayesian modeling and inference for massive spatial datasets. Technometrics 60(4), 430–444 (2018)

    MathSciNet  Google Scholar 

  • Guinness, J.: Spectral density estimation for random fields via periodic embeddings. Biometrika 106(2), 267–286 (2019)

    MathSciNet  MATH  Google Scholar 

  • Guinness, J., Fuentes, M.: Circulant embedding of approximate covariances for inference from Gaussian data on large lattices. J. Comput. Graph. Stat. 26(1), 88–97 (2017)

    MathSciNet  Google Scholar 

  • Gyires, B.: Eigenwerte verallgemeinerter Toeplitzschen matrizen. Publ. Math. Debrecen. 4, 171–179 (1956)

    MathSciNet  MATH  Google Scholar 

  • Heaton, M.J., Datta, A., Finley, A.O., Furrer, R., Guinness, J., Guhaniyogi, R., Gerber, F., Gramacy, R.B., Hammerling, D., Katzfuss, M., et al.: A case study competition among methods for analyzing large spatial data. J. Agric. Biol. Environ. Stat. 24(3), 398–425 (2019)

    MathSciNet  MATH  Google Scholar 

  • Higdon, D.: Space and space-time modeling using process convolutions. In: Quantitative methods for current environmental issues, Springer, pp. 37–56 (2002)

  • Kang, E.L., Cressie, N.: Bayesian inference for the spatial random effects model. J. Am. Stat. Assoc. 106(495), 972–983 (2011)

    MathSciNet  MATH  Google Scholar 

  • Katzfuss, M.: A multi-resolution approximation for massive spatial datasets. J. Am. Stat. Assoc. 112(517), 201–214 (2017)

    MathSciNet  Google Scholar 

  • Katzfuss, M., Cressie, N.: Spatio-temporal smoothing and EM estimation for massive remote-sensing data sets. J. Time Ser. Anal. 32(4), 430–446 (2011)

    MathSciNet  MATH  Google Scholar 

  • Katzfuss, M., Hammerling, D.: Parallel inference for massive distributed spatial data using low-rank models. Stat. Comput. 27(2), 363–375 (2017)

    MathSciNet  MATH  Google Scholar 

  • Kaufman, C.G., Schervish, M.J., Nychka, D.W.: Covariance tapering for likelihood-based estimation in large spatial data sets. J. Am. Stat. Assoc. 103(484), 1545–1555 (2008)

    MathSciNet  MATH  Google Scholar 

  • Kent, J.T., Mardia, K.V.: Spectral and circulant approximations to the likelihood for stationary Gaussian random fields. J. Statist. Plan. Inference 50(3), 379–394 (1996)

    MathSciNet  MATH  Google Scholar 

  • Lindgren, F., Rue, H., Lindström, J.: An explicit link between Gaussian fields and Gaussian Markov random fields: the stochastic partial differential equation approach. J. R. Statist. Soc. Ser. B Statist. Methodol. 73(4), 423–498 (2011)

    MathSciNet  MATH  Google Scholar 

  • Liu, H., Ong, Y.S., Shen, X., Cai, J.: When Gaussian process meets big data: A review of scalable GPs. IEEE Trans. Neural Netw. Learn. Syst. (2020)

  • Martino, S., Rue, H.: Implementing approximate bayesian inference using integrated nested laplace approximation: A manual for the inla program. Department of Mathematical Sciences, NTNU, Norway (2009)

  • Matérn, B.: Spatial variation, volume 36 of. Lecture Notes in Statistics (1960)

  • Minden, V., Damle, A., Ho, K.L., Ying, L.: Fast spatial Gaussian process maximum likelihood estimation via skeletonization factorizations. Multiscale Model. Simul. 15(4), 1584–1611 (2017)

    MathSciNet  MATH  Google Scholar 

  • Nychka, D., Bandyopadhyay, S., Hammerling, D., Lindgren, F., Sain, S.: A multiresolution Gaussian process model for the analysis of large spatial datasets. J. Comput. Graph. Stat. 24(2), 579–599 (2015)

  • Paciorek, C.J., Lipshitz, B., Zhuo, W., Kaufman, C.G., Thomas, R.C., et al.: Parallelizing Gaussian Process Calculations in R. J. Statist. Softw. 63(i10), (2015)

  • Rue, H., Held, L.: Gaussian Markov random fields: theory and applications. CRC Press, Cambridge (2005)

    MATH  Google Scholar 

  • Saad, Y.: Iterative methods for sparse linear systems, 2nd edn. Society for Industrial and Applied Mathematics, Philadelphia, PA (2003), https://doi.org/10.1137/1.9780898718003, https://doi-org.prox.lib.ncsu.edu/10.1137/1.9780898718003

  • Stein, M.L.: Fast and exact simulation of fractional Brownian surfaces. J. Comput. Graph. Stat. 11(3), 587–599 (2002)

    MathSciNet  Google Scholar 

  • Stein, M.L.: Statistical properties of covariance tapers. J. Comput. Graph. Stat. 22(4), 866–885 (2013)

    MathSciNet  Google Scholar 

  • Stein, M.L., Chi, Z., Welty, L.J.: Approximating likelihoods for large spatial data sets. J. R. Statist. Soc. Ser. B Statist. Methodol. 66(2), 275–296 (2004)

    MathSciNet  MATH  Google Scholar 

  • Sun, Y., Li, B., Genton, M.G.: Geostatistics for large datasets. In: Advances and challenges in space-time modelling of natural events, Springer, pp. 55–77 (2012)

  • Ubaru, S., Chen, J., Saad, Y.: Fast estimation of tr(f(\({A}\))) via stochastic Lanczos quadrature. SIAM J. Matrix Anal. Appl. 38(4), 1075–1099 (2017)

    MathSciNet  MATH  Google Scholar 

  • Varin, C., Reid, N., Firth, D.: An overview of composite likelihood methods. Statistica Sinica pp 5–42 (2011)

  • Vecchia, A.V.: Estimation and model identification for continuous spatial processes. J. R. Stat. Soc. Ser. B Methodol. 50(2), 297–312 (1988)

    MathSciNet  Google Scholar 

  • Wendland, H.: Piecewise polynomial, positive definite and compactly supported radial functions of minimal degree. Adv. Comput. Math. 4(1), 389–396 (1995)

    MathSciNet  MATH  Google Scholar 

  • Widom, H.: Asymptotic behavior of block Toeplitz matrices and determinants. Adv. Math. 13(3), 284–322 (1974)

    MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

The authors were partially supported by the National Science Foundation through the awards DMS-1845406 and DMS-1638521. The authors were also partially supported by the National Institute of Health through the awards R01ES031651-01 and R01ES027892 and by The King Abdullah University of Science and Technology grant 3800.2. We would like to thank them for their support.

Funding

The authors were partially supported by the National Science Foundation through the awards DMS-1845406 and DMS-1638521. The authors were also partially supported by the National Institute of Health through the awards R01ES031651-01 and R01ES027892 and by The King Abdullah University of Science and Technology grant 3800.2.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Suman Majumder.

Ethics declarations

Conflict of interest

The authors have no conflicts of interest to declare that are relevant to the content of this article.

Code availability

A GitHub repository has been set up that contains codes and a demonstration file for the methods described in the article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Gradient and Hessian computation for the optimization procedure

In this section, we present the necessary details of computing and approximating the gradient and Hessian for the optimization routine.

We first derive exact expressions for the gradient and then show how to approximate them using the strategy in Sects. 3.2 and 3.1. Computing the analytical gradient would require computing derivatives of \({\varvec{\Gamma } }= \left( \frac{1}{\sigma ^2}{\varvec{\Sigma }}(\varvec{\theta })^{-1} + \frac{1}{\tau ^2}{\mathbf {A}}^\mathsf {T}{\mathbf {A}}\right) ^{-1}\) and \({\widehat{{\mathbf {x}}}}(\varvec{\theta })\) with respect to each of \(\mu , \sigma ^2, \tau ^2\) and \(\rho \). For convenience, we reparameterize \(1/\sigma ^2 = \lambda ^2\) and \(1/\tau ^2 = \lambda _e^2\). Using the precision instead of variance brings about greater ease in computing the analytical derivatives. Under the new parameterization,

$$\begin{aligned} {\varvec{\Gamma } }=&\left( \lambda _e^2{\mathbf {A}}^\mathsf {T}{\mathbf {A}}+ \lambda ^2{\varvec{\Sigma }}^{-1} \right) ^{-1}, \end{aligned}$$
(15)
$$\begin{aligned} {\widehat{{\mathbf {x}}}}(\varvec{\theta })=&\varvec{{\Gamma }}\lambda _e^2{\mathbf {A}}^\mathsf {T}({\mathbf {y}}-{\mathbf {X}}\varvec{\beta }),\end{aligned}$$
(16)

and

$$\begin{aligned} \begin{aligned} \text{ pl }(\varvec{\theta })&\simeq \frac{p}{2}\log \lambda _e^2 -\frac{\lambda _e^2}{2}{\widehat{\varvec{\psi }}}(\varvec{\theta })^\mathsf {T}{\widehat{\varvec{\psi }}}(\varvec{\theta })\\ {}&\quad + \frac{n}{2}\log \lambda ^2 - \frac{1}{2}\log \det \,{\varvec{\Sigma }}(\varvec{\theta }) -\\ {}&\quad \frac{\lambda ^2}{2}{\hat{{\mathbf {x}}}}(\varvec{\theta })^\mathsf {T}{\varvec{\Sigma }}(\varvec{\theta })^{-1}{\hat{{\mathbf {x}}}}(\varvec{\theta }), \end{aligned}\end{aligned}$$
(17)

where \({\widehat{\varvec{\psi }}}(\varvec{\theta }) = {\mathbf {y}}- {\mathbf {X}}\varvec{\beta }- {\mathbf {A}}{\hat{{\mathbf {x}}}}(\varvec{\theta })\).

The derivatives for \(\varvec{{\Gamma } }\) are computed to be

$$\begin{aligned} \begin{aligned} \frac{\partial {\varvec{\Gamma } }}{\partial \varvec{\beta }}&= {\mathbf {0}}, \\ \frac{\partial {\varvec{\Gamma } }}{\partial \lambda ^2}&= - {\varvec{\Gamma }}{\varvec{\Sigma }}(\rho )^{-1}{\varvec{\Gamma } },\\ \frac{\partial {\varvec{\Gamma } }}{\partial \rho }&= \lambda ^2 {\varvec{\Gamma }}{\varvec{\Sigma }}(\rho )^{-1}\left( \text{ d }{\varvec{\Sigma }}(\rho ) \right) {\varvec{\Sigma }}(\rho )^{-1} {\varvec{\Gamma } }, \\ \frac{\partial {\varvec{\Gamma } }}{\partial \lambda _e^2}&= - {\varvec{\Gamma } }{\mathbf {A}}^\mathsf {T}{\mathbf {A}}{\varvec{\Gamma }}, \end{aligned}\end{aligned}$$
(18)

where \(\text{ d }{\varvec{\Sigma }}(\rho )\) denotes the derivative of \({\varvec{\Sigma }}(\rho )\) with respect to \(\rho \). This is easy to compute analytically and has the nice BTTB property that \({\varvec{\Sigma }}(\rho )\) has.

Using the expressions in (18), we compute the derivatives of \({\widehat{{\mathbf {x}}}}(\varvec{\theta })\) to be

$$\begin{aligned} \begin{aligned} \frac{\partial {\widehat{{\mathbf {x}}}}(\varvec{\theta })}{\partial \varvec{\beta }}&= -\lambda _e^2 {\varvec{\Gamma } }{\mathbf {A}}^\mathsf {T}{\mathbf {X}},\\ \frac{\partial {\widehat{{\mathbf {x}}}}(\varvec{\theta })}{\partial \lambda ^2}&= -{\varvec{\Gamma } }{\varvec{\Sigma }}(\rho )^{-1} {\widehat{{\mathbf {x}}}}(\varvec{\theta }),\\ \frac{\partial {\widehat{{\mathbf {x}}}}(\varvec{\theta })}{\partial \rho }&= \lambda ^2 {\varvec{\Gamma } }{\varvec{\Sigma }}(\rho )^{-1} \left( \text{ d } {\varvec{\Sigma }}(\rho )\right) {\varvec{\Sigma }}(\rho )^{-1}{\widehat{{\mathbf {x}}}}(\varvec{\theta }),\\ \frac{\partial {\widehat{{\mathbf {x}}}}(\varvec{\theta })}{\partial \lambda _e^2}&= {\varvec{\Gamma } }{\mathbf {A}}^\mathsf {T}{\widehat{\varvec{\psi }}}(\varvec{\theta }) . \end{aligned} \end{aligned}$$
(19)

Substituting the expressions for analytical derivatives of \(\varvec{\Gamma }\) and \({\widehat{{\mathbf {x}}}}(\varvec{\theta })\) in the expression for the analytical gradient, we have it computed to be

$$\begin{aligned} \begin{aligned} \frac{\partial \text{ pl }}{\partial \varvec{\beta }}&= \> \lambda _e^2{\mathbf {X}}^\mathsf {T}{\widehat{\varvec{\psi }}}(\varvec{\theta }),\\ \frac{\partial \text{ pl }}{\partial \lambda ^2}&= \>\frac{n}{2\lambda ^2} - \frac{1}{2}{\widehat{{\mathbf {x}}}}(\varvec{\theta })^\mathsf {T}{\varvec{\Sigma }}(\rho )^{-1}{\widehat{{\mathbf {x}}}}(\varvec{\theta })\\ \frac{\partial \text{ pl }}{\partial \rho }&= -\frac{1}{2}\text{ dL } + \frac{1}{2}\lambda ^2 {\widehat{{\mathbf {x}}}}(\varvec{\theta })^\mathsf {T}{\varvec{\Sigma }}(\rho )^{-1}\left( \text{ d }{\varvec{\Sigma }}(\rho ) \right) {\varvec{\Sigma }}(\rho )^{-1}{\widehat{{\mathbf {x}}}}(\varvec{\theta }),\\ \frac{\partial \text{ pl }}{\partial \lambda _e^2}&= \> \frac{p}{2\lambda _e^2} - \frac{1}{2}{\widehat{\varvec{\psi }}}(\varvec{\theta })^\mathsf {T}{\widehat{\varvec{\psi }}}(\varvec{\theta }), \end{aligned}\end{aligned}$$
(20)

where \(\text {dL}\) is the derivative of \(\log \det \,{\varvec{\Sigma }}(\rho )\) with respect to \(\rho \).

We approximate the gradient expressions in (20) by approximating \({\widehat{{\mathbf {x}}}}(\varvec{\theta })\) by \({\mathbf {x}}_k^*(\varvec{\theta })\) as in (12) and using the exact arithmetic identities expressed in (10). The approximated gradients can be computed as

$$\begin{aligned} \begin{aligned} \frac{\partial \text{ pl }}{\partial \varvec{\beta }}&\approx \lambda _e^2 {\mathbf {X}}^\mathsf {T}\varvec{\psi }^*_k(\varvec{\theta }),\\ \frac{\partial \text{ pl }}{\partial \lambda ^2}&\approx \frac{n}{2\lambda ^2} - \frac{1}{2}\Vert {\mathbf {z}}_k\Vert _2^2,\\ \frac{\partial \text{ pl }}{\partial \rho }&\approx -\frac{1}{2}\widehat{\text{ dL }} + \frac{\lambda ^2}{2}{\mathbf {z}}_k^\mathsf {T}{\mathbf {V}}_k^\mathsf {T}\left( \text{ d }{\varvec{\Sigma }}(\rho ) \right) {\mathbf {V}}_k{\mathbf {z}}_k,\\ \frac{\partial \text{ pl }}{\partial \lambda _e^2}&\approx \frac{p}{2\lambda _e^2} - \frac{1}{2}\varvec{\psi }^*_k(\varvec{\theta })^\mathsf {T}\varvec{\psi }^*_k(\varvec{\theta }), \end{aligned}\end{aligned}$$
(21)

where \(\varvec{\psi }^*_k(\varvec{\theta }) = {\mathbf {y}}- {\mathbf {X}}\varvec{\beta }- {\mathbf {A}}{\mathbf {x}}^*_k(\varvec{\theta })\) and \({\mathbf {V}}_k, {\mathbf {z}}_k\) have been defined in Sect. 3.2.

\(\widehat{\text {dL}}\) is an approximation to \(\text {dL}\), the derivative of the log-determinant of \({\varvec{\Sigma }}(\rho )\) with respect to \(\rho \). The analytical expression for \(\text {dL}\) turns out to be

$$\begin{aligned} \text{ dL } = \text{ trace }\left( {\varvec{\Sigma }}(\rho )^{-1}\text{ d }{\varvec{\Sigma }}(\rho ).\right) \end{aligned}$$

This is infeasible to compute directly and is therefore approximated using the BTTB structure of \({\varvec{\Sigma }}(\rho )\) and \(\text {d}{\varvec{\Sigma }}(\rho )\).

Any symmetric matrix with BTTB structure can be extended to have a BCCB structure as was done in computing the log-determinant itself and one can extract the eigenvalues of the matrix with BTTB structure using the matrix with BCCB structure. Any BCCB matrix is diagonalizable as \({\mathbf {F}}{\mathbf {D}}{\mathbf {F}}^\mathsf{T}\), where \({\mathbf {F}}\) is a scaled matrix consisting of d-dimensional (d=2, in our case) Fourier coefficients, irrespective of the BCCB matrix being diagonalized. Therefore, we can say

$$\begin{aligned} \begin{aligned} {\varvec{\Sigma }}(\rho )&= {\mathbf {F}}{\mathbf {D}}_1 {\mathbf {F}}^\mathsf {T},\\ {\varvec{\Sigma }}(\rho )^{-1}&= {\mathbf {F}}{\mathbf {D}}_1^{-1} {\mathbf {F}}^\mathsf {T},\\ \text{ d }{\varvec{\Sigma }}(\rho )&= {\mathbf {F}}{\mathbf {D}}_2 {\mathbf {F}}^\mathsf {T}. \end{aligned} \end{aligned}$$
(22)

These imply that

$$\begin{aligned} \begin{aligned} \text{ trace }\left( {\varvec{\Sigma }}(\rho )^{-1}\text{ d }{\varvec{\Sigma }}(\rho ) \right)&= \text{ trace }\left( {\mathbf {F}}{\mathbf {D}}_1^{-1}{\mathbf {F}}^\mathsf {T}{\mathbf {F}}{\mathbf {D}}_2 {\mathbf {F}}^\mathsf {T} \right) \\ {}&\quad = \text{ trace }\left( {\mathbf {D}}_1^{-1}{\mathbf {D}}_2 \right) . \end{aligned}\end{aligned}$$
(23)

Since both \({\mathbf {D}}_1\) and \({\mathbf {D}}_2\) are diagonal, approximating \(\text {dL}\) boils down to computing \({\mathbf {D}}_1\) and \({\mathbf {D}}_2\), which can be computed by d-dimensional FFT of the corresponding first circulant block structures of the extended BCCB structure and subsetting it properly. The equivalence in computing the derivative of log-determinant of the BTTB and matrix and its corresponding BCCB matrix has been demonstrated by Kent and Mardia (1996), showing the approximation to have the same error rate as in approximating the log-determinant itself. Approximating the derivative of the log-determinant term also costs the same as approximating the log-determinant itself, \({\mathcal {O}}(n \log n)\).

While minimizing the negative log-likelihood function, the Hessian turns out to be simply the Information matrix \({\mathbf {I}}(\varvec{\theta })\). While

$$\begin{aligned} {\mathbb {E}}\left( -\nabla _2 \text {pl}(\varvec{\theta }) \right) = {\mathbf {I}}(\varvec{\theta }), \end{aligned}$$

we also have

$$\begin{aligned} {\mathbb {E}}\left( \nabla \text {pl}(\varvec{\theta }) \nabla \text {pl}(\varvec{\theta })^\mathsf{T} \right) = {\mathbb {E}}\left[ \left( -\nabla \text {pl}(\varvec{\theta })\right) \left( -\nabla \text {pl}(\varvec{\theta })\right) ^\mathsf{T} \right] . \end{aligned}$$

Here the expectations are computed with respect to \({\mathbf {y}}\) and \(\nabla \), \(\nabla _2\) represent the gradient and Hessian created by computing first- and second-order partial derivatives with respect to \(\varvec{\theta }\). Therefore, the outer product of the gradient with itself serves as a rank-one estimate for the Hessian for a likelihood optimization problem. Although we are using profile likelihood instead of the actual likelihood function, the approximation still stands in an asymptotic sense since both the actual likelihood estimator and the profile likelihood estimators have the same asymptotic properties. This prompts us to take the outer product of the approximated gradient with itself as a rank-one approximation to the Hessian.

However, we compute the unique entries of the exact Hessian to be

$$\begin{aligned} \begin{aligned} \frac{\partial ^2 \text{ pl }}{\partial \varvec{\beta }\partial \varvec{\beta }^\mathsf {T}}&= -\lambda _e^2{\mathbf {X}}^\mathsf {T}{\mathbf {X}}+ \lambda _e^4{\mathbf {X}}^\mathsf {T}{\mathbf {A}}{\varvec{\Gamma } }{\mathbf {A}}^\mathsf {T}{\mathbf {X}}\\ \frac{\partial ^2 \text{ pl }}{\partial \varvec{\beta }\partial \lambda ^2}&= \lambda _e^2{\mathbf {X}}^\mathsf {T}{\mathbf {A}}{\varvec{\Gamma } }{\varvec{\Sigma }}(\rho )^{-1}{\widehat{{\mathbf {x}}}}(\varvec{\theta })\\ \frac{\partial ^2 \text{ pl }}{\partial \varvec{\beta }\partial \rho }&= -\lambda _e^2\lambda ^2{\mathbf {X}}^\mathsf {T}{\mathbf {A}}{\varvec{\Gamma } }{\varvec{\Sigma }}(\rho )^{-1}\left( \text{ d }{\varvec{\Sigma }}(\rho ) \right) {\varvec{\Sigma }}(\rho )^{-1}{\widehat{{\mathbf {x}}}}(\varvec{\theta })\\ \frac{\partial ^2 \text{ pl }}{\partial \varvec{\beta }\partial \lambda _e^2}&= {\mathbf {X}}^\mathsf {T}{\widehat{\varvec{\psi }}}(\varvec{\theta }) - \lambda _e^2{\mathbf {X}}^\mathsf {T}{\mathbf {A}}{\varvec{\Gamma } }{\mathbf {A}}^\mathsf {T}{\widehat{\varvec{\psi }}}(\varvec{\theta })\\ \frac{\partial ^2 \text{ pl }}{\partial \lambda ^4}&= -\frac{n}{2\lambda ^4} + {\widehat{{\mathbf {x}}}}(\varvec{\theta })^\mathsf {T}{\varvec{\Sigma }}(\rho )^{-1}{\varvec{\Gamma } }{\varvec{\Sigma }}(\rho )^{-1}{\widehat{{\mathbf {x}}}}({\theta })\\ \frac{\partial ^2 \text{ pl }}{\partial \lambda ^2 \partial \rho }&= \frac{1}{2}{\widehat{{\mathbf {x}}}}(\varvec{\theta })^\mathsf {T}{\varvec{\Sigma }}(\rho )^{-1}\left( \text{ d }{\varvec{\Sigma }}(\rho ) \right) {\varvec{\Sigma }}(\rho )^{-1}{\widehat{{\mathbf {x}}}}(\varvec{\theta })\\ {}&\quad - \lambda ^2{\widehat{{\mathbf {x}}}}(\varvec{\theta })^\mathsf {T}{\varvec{\Sigma }}(\rho )^{-1}\left( \text{ d }{\varvec{\Sigma }}(\rho ) \right) \\ {}&\quad \times {\varvec{\Sigma }}(\rho )^{-1}{\varvec{\Gamma }}{\varvec{\Sigma }}(\rho )^{-1}{\widehat{{\mathbf {x}}}}(\varvec{\theta })\\ \frac{\partial ^2 \text{ pl }}{\partial \lambda ^2 \partial \lambda _e^2}&= -{\widehat{{\mathbf {x}}}}(\varvec{\theta })^\mathsf {T}{\varvec{\Sigma }}(\rho )^{-1}{\varvec{\Gamma } }{\mathbf {A}}^\mathsf {T}{\widehat{\varvec{\psi }}}(\varvec{\theta })\\ \frac{\partial ^2 \text{ pl }}{\partial \rho ^2}&= -\frac{1}{2}\text{ d}^2\text{ L } + \frac{\lambda ^2}{2}{\widehat{{\mathbf {x}}}}(\varvec{\theta })\\ {}&\quad \times {\varvec{\Sigma }}(\rho )^{-1}\left( \text{ d}^2{\varvec{\Sigma }}(\rho ) \right) {\varvec{\Sigma }}(\rho )^{-1}{\widehat{{\mathbf {x}}}}(\varvec{\theta })\\ {}&\quad - \lambda ^2{\widehat{{\mathbf {x}}}}(\varvec{\theta })^\mathsf {T}{\varvec{\Sigma }}(\rho )^{-1}\left( \text{ d }{\varvec{\Sigma }}(\rho ) \right) \left[ {\varvec{\Sigma }}(\rho )^{-1}\right. \\ {}&\qquad \left. - {\varvec{\Sigma }}(\rho )^{-1}{\varvec{\Gamma } }{\varvec{\Sigma }}(\rho )^{-1} \right] \left( \text{ d }{\varvec{\Sigma }}(\rho ) \right) {\varvec{\Sigma }}(\rho )^{-1}{\widehat{{\mathbf {x}}}}(\varvec{\theta })\\ \frac{\partial ^2 \text{ pl }}{\partial \rho \partial \lambda _e^2}&= \lambda ^2{\widehat{{\mathbf {x}}}}(\varvec{\theta })^\mathsf {T}{\varvec{\Sigma }}(\rho )^{-1}\left( \text{ d }{\varvec{\Sigma }}(\rho ) \right) {\varvec{\Sigma }}(\rho )^{-1}{\varvec{\Gamma } }{\mathbf {A}}^\mathsf {T}{\widehat{\varvec{\psi }}}(\varvec{\theta })\\ \frac{\partial ^2 \text{ pl }}{\partial \lambda _e^4}&= -\frac{p}{\lambda _e^4} + {\widehat{\varvec{\psi }}}(\varvec{\theta })^\mathsf {T}{\mathbf {A}}{\varvec{\Gamma } }{\mathbf {A}}^\mathsf {T}{\widehat{\varvec{\psi }}}(\varvec{\theta }), \end{aligned}\end{aligned}$$
(24)

where \(\text {d}^2\text {L}\) represents the second derivative of \(\log \det \,{\varvec{\Sigma }}(\rho )\) with respect to \(\rho \) and \(\text {d}^2{\varvec{\Sigma }}(\rho )\) is the second derivative of \({\varvec{\Sigma }}(\rho )\) with respect to \(\rho \). \(\text {d}^2{\varvec{\Sigma }}(\rho )\) also has a BTTB structure as \(\varvec{{\Sigma }}(\rho )\) and \(\text {d}{\varvec{\Sigma }}(\rho )\).

These entries are then approximated using the approximation to \({\varvec{\Gamma }}\) as presented in Chung et al. (2018), namely

$$\begin{aligned} {\varvec{\Gamma } }\approx \lambda ^{-2}\left( {\varvec{\Sigma }}(\rho ) - {\mathbf {Z}}_k{\varvec{\varDelta }}_k{\mathbf {Z}}_k^\mathsf {T} \right) ,\end{aligned}$$
(25)

where \({\mathbf {Z}}_k = {\varvec{\Sigma }}(\rho ){\mathbf {V}}_k{\mathbf {W}}_k\) with \({\mathbf {B}}_k^\mathsf{T}{\mathbf {B}}_k = {\mathbf {W}}_k\varvec{\varTheta }_k{\mathbf {W}}_k\) and \(\varvec{\varDelta }_k = \left( {\mathbf {I}}+ \lambda ^{-2}\varvec{\varTheta }_k \right) ^{-1}\).

Table 6 RMSE in estimating the parameters for SPDE and Kryging for different grid sizes and choices of k as in the first simulation study. The true values for the parameters were (44.49, 3, 0.5, 1). The figures in brackets indicate standard error

We define \({\mathbf {z}}_0 = {\mathbf {y}}- {\mathbf {X}}\varvec{\beta }- {\mathbf {A}}{\mathbf {x}}^*_k(\varvec{\theta }) = \varvec{\psi }^*_k(\varvec{\theta })\). The approximated entries of the Hessian are

$$\begin{aligned} \frac{\partial ^2 \text {pl}}{\partial \varvec{\beta }\partial \varvec{\beta }^\mathsf{T}}&\approx -\lambda _e^2{\mathbf {X}}^\mathsf{T}{\mathbf {X}}+ \frac{\lambda _e^4}{\lambda ^2}{\mathbf {X}}^\mathsf{T}{\mathbf {A}}\varvec{{\Sigma }}(\rho ){\mathbf {A}}^\mathsf{T}{\mathbf {X}}\nonumber \\&\quad - \frac{\lambda _e^4}{\lambda ^2}{\mathbf {X}}^\mathsf{T}{\mathbf {U}}_k{\mathbf {B}}_k{\mathbf {W}}_k\varvec{\varDelta }_k{\mathbf {W}}_k^\mathsf{T}{\mathbf {B}}_k^\mathsf{T}{\mathbf {U}}_k^\mathsf{T}{\mathbf {X}}\nonumber \\ \frac{\partial ^2 \text {pl}}{\partial \varvec{\beta }\partial \lambda ^2}&\approx \frac{\lambda _e^2}{\lambda ^2}{\mathbf {X}}^\mathsf{T}{\mathbf {A}}{\widehat{x}}^*(\varvec{\theta }) - \frac{\lambda _e^2}{\lambda ^2}{\mathbf {X}}^\mathsf{T}{\mathbf {U}}_k{\mathbf {B}}_k{\mathbf {W}}_k\varvec{\varDelta }_k{\mathbf {W}}_k^\mathsf{T}{\mathbf {z}}_k \nonumber \\ \frac{\partial ^2 \text {pl}}{\partial \varvec{\beta }\partial \rho }&\approx -\lambda _e^2{\mathbf {X}}^\mathsf{T}{\mathbf {A}}\left( \text {d}\varvec{{\Sigma }}(\rho ) \right) {\mathbf {V}}_k{\mathbf {z}}_k \nonumber \\&\quad + \lambda _e^2{\mathbf {X}}^\mathsf{T}{\mathbf {U}}_k{\mathbf {B}}_k{\mathbf {W}}_k\varvec{\varDelta }_k{\mathbf {W}}_k^\mathsf{T}{\mathbf {V}}_k^\mathsf{T}\left( \text {d}\varvec{{\Sigma }}(\rho ) \right) {\mathbf {V}}_k{\mathbf {z}}_k \nonumber \\ \frac{\partial ^2 \text {pl}}{\partial \varvec{\beta }\partial \lambda _e^2}&\approx -{\mathbf {X}}^\mathsf{T}{\mathbf {z}}_0 - \frac{\lambda _e^2}{\lambda ^2}{\mathbf {X}}^\mathsf{T}{\mathbf {A}}\varvec{{\Sigma }}(\rho ){\mathbf {A}}^\mathsf{T}{\mathbf {z}}_0 \nonumber \\&\quad + \frac{\lambda _e^2}{\lambda ^2}{\mathbf {X}}^\mathsf{T}{\mathbf {U}}_k{\mathbf {B}}_k{\mathbf {W}}_k\varvec{\varDelta }_k{\mathbf {W}}_k^\mathsf{T}{\mathbf {B}}_k^\mathsf{T}{\mathbf {U}}_k^\mathsf{T}{\mathbf {z}}_0 \nonumber \\ \frac{\partial ^2 \text {pl}}{\partial \lambda ^4}&\approx -\frac{n}{2\lambda ^4} + \frac{1}{\lambda ^2}\Vert {\mathbf {z}}_k\Vert _2^2 - \frac{1}{\lambda ^2}{\mathbf {z}}_k^\mathsf{T}{\mathbf {W}}_k\varvec{\varDelta }_k{\mathbf {W}}_k^\mathsf{T}{\mathbf {z}}_k \nonumber \\ \frac{\partial ^2 \text {pl}}{\partial \lambda ^2 \partial \rho }&\approx -\frac{1}{2}{\mathbf {z}}_k^\mathsf{T}{\mathbf {V}}_k^\mathsf{T}\left( \text {d}\varvec{{\Sigma }}(\rho ) \right) {\mathbf {V}}_k{\mathbf {z}}_k \nonumber \\&\quad + {\mathbf {z}}_k^\mathsf{T}{\mathbf {V}}_k^\mathsf{T}\left( \text {d}\varvec{{\Sigma }}(\rho ) \right) {\mathbf {V}}_k{\mathbf {W}}_k\varvec{\varDelta }_k{\mathbf {W}}_k^\mathsf{T}{\mathbf {z}}_k \nonumber \\ \frac{\partial ^2 \text {pl}}{\partial \lambda ^2 \partial \lambda _e^2}&\approx -\frac{1}{\lambda ^2}{\mathbf {z}}_k^\mathsf{T}{\mathbf {B}}_k^\mathsf{T}{\mathbf {U}}_k^\mathsf{T}{\mathbf {z}}_0 + \frac{1}{\lambda ^2}{\mathbf {z}}_k^\mathsf{T}{\mathbf {W}}_k^\mathsf{T}\varvec{\varDelta }_k{\mathbf {W}}_k^\mathsf{T}{\mathbf {B}}_k^\mathsf{T}{\mathbf {U}}_k^\mathsf{T}{\mathbf {z}}_0 \nonumber \\ \frac{\partial ^2 \text {pl}}{\partial \rho ^2}&\approx -\frac{1}{2}\widehat{\text {d}^2\text {L}} + \frac{\lambda ^2}{2}{\mathbf {z}}_k^\mathsf{T}{\mathbf {V}}_k^\mathsf{T}\left( \text {d}^2\varvec{{\Sigma }}(\rho ) \right) {\mathbf {V}}_k{\mathbf {z}}_k \nonumber \\&\quad - \lambda ^2 {\mathbf {z}}_k^\mathsf{T}{\mathbf {V}}_k^\mathsf{T}\left( \text {d}\varvec{{\Sigma }}(\rho ) \right) {\mathbf {V}}_k{\mathbf {W}}_k\varvec{\varDelta }_k \nonumber \\&\quad \times {\mathbf {W}}_k^\mathsf{T}{\mathbf {V}}_k^\mathsf{T}\left( \text {d}\varvec{{\Sigma }}(\rho ) \right) {\mathbf {V}}_k{\mathbf {z}}_k \nonumber \\ \frac{\partial ^2 \text {pl}}{\partial \rho \partial \lambda _e^2}&\approx {\mathbf {z}}_k^\mathsf{T}{\mathbf {V}}_k^\mathsf{T}\left( \text {d}\varvec{{\Sigma }}(\rho ) \right) {\mathbf {A}}^\mathsf{T}{\mathbf {z}}_0 \nonumber \\&\quad - {\mathbf {z}}_k^\mathsf{T}{\mathbf {V}}_k^\mathsf{T}\left( \text {d}\varvec{{\Sigma }}(\rho ) \right) {\mathbf {V}}_k{\mathbf {W}}_k\varvec{\varDelta }_k{\mathbf {W}}_k^\mathsf{T}{\mathbf {B}}_k^\mathsf{T}{\mathbf {U}}_k^\mathsf{T}{\mathbf {z}}_0 \nonumber \\ \frac{\partial ^2 \text {pl}}{\partial \lambda _e^4}&\approx -\frac{p}{2\lambda _e^4} + \frac{1}{\lambda ^2}{\mathbf {z}}_0^\mathsf{T}{\mathbf {A}}\varvec{{\Sigma }}(\rho ){\mathbf {A}}^\mathsf{T}{\mathbf {z}}_0 \nonumber \\&\quad - \frac{1}{\lambda ^2}{\mathbf {z}}_0^\mathsf{T}{\mathbf {U}}_k{\mathbf {B}}_k{\mathbf {W}}_k\varvec{\varDelta }_k{\mathbf {W}}_k^\mathsf{T}{\mathbf {B}}_k^\mathsf{T}{\mathbf {U}}_k^\mathsf{T}{\mathbf {z}}_0, \end{aligned}$$
(26)

where \(\widehat{\text {d}^2\text {L}}\) is a numerical approximation to \(\text {d}^2\text {L}\). We do not use this approximation for our computing, but hope to use it in future.

Additional tables from the simulation study

In this section, we provide additional results for the simulation study. Table 6 evaluates parameter estimations for the first simulation study for both SPDE and Kryging methods. The same is done in Tables 7 and 8 for the second and third simulation studies. The results across the board are similar as mentioned in Section 4. SPDE performs better in estimating the nugget parameter \(\tau ^2\), while Kryging performs better in estimating the partial sill parameter \(\sigma ^2\). Both methods do equally well in estimating the mean parameter \(\beta \) and the spatial range parameter \(\rho \).

Table 7 RMSE in estimating the parameters for SPDE and Kryging under different parametric settings and different choices of k as in the second simulation study. The true values for the parameters were (44.49, 3, 0.5, 0.05), (44.49, 3, 0.5, 0.2), (44.49, 1.5, 0.5, 0.1) and (44.49, 6, 0.5, 0.1) for settings 1 through 4, respectively. The figures in brackets indicate standard error
Table 8 RMSE in estimating the parameters for SPDE and Kryging for different choices of underlying grid size and k for the simulation study with irregularly spaced data. The true parameter values were (44.49, 3, 0.5, 0.1). The figures in brackets indicate standard error

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Majumder, S., Guan, Y., Reich, B.J. et al. Kryging: geostatistical analysis of large-scale datasets using Krylov subspace methods. Stat Comput 32, 74 (2022). https://doi.org/10.1007/s11222-022-10104-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11222-022-10104-3

Keywords

Navigation