×

Marginally parameterized spatio-temporal models and stepwise maximum likelihood estimation. (English) Zbl 07345934

Summary: In order to learn the complex features of large spatio-temporal data, models with a large number of parameters are often required. However, inference is often infeasible due to the computational and memory costs of maximum likelihood estimation (MLE). The class of marginally parameterized (MP) models is introduced, where estimation can be performed efficiently with a sequence of marginal likelihood functions with stepwise maximum likelihood estimation (SMLE). The conditions under which the stepwise estimators are consistent are provided, and it is shown that this class of models includes the diagonal vector autoregressive moving average model. It is demonstrated that the parameters of this model can be obtained at least three orders of magnitude faster with SMLE compared to MLE, with only a small loss in statistical efficiency. A MP model is applied to a spatio-temporal global climate data set consisting of over five million data points, and it is demonstrated how estimation can be achieved in less than one hour on a laptop with a dual core at 2.9 Ghz.

MSC:

62M30 Inference from spatial processes
62P12 Applications of statistics to environmental and related topics
86A32 Geostatistics
62-08 Computational methods for problems pertaining to statistics

Software:

Stem; FRK

References:

[1] Banerjee, S.; Gelfand, A. E.; Finley, A. O.; Sang, H., Gaussian predictive process models for large spatial data sets, J. R. Stat. Soc. Ser. B Stat. Methodol., 70, 4, 825-848 (2008) · Zbl 1533.62065
[2] Bevilacqua, M.; Gaetan, C.; Mateu, J.; Porcu, E., Estimating space and space-time covariance functions for large data sets: a weighted composite likelihood approach, J. Amer. Statist. Assoc., 107, 497, 268-280 (2012) · Zbl 1261.62088
[3] Cameletti, M.; Ignaccolo, R.; Bande, S., Comparing spatio-temporal models for particulate matter in Piemonte, Environmetrics, 22, 8, 985-996 (2011)
[4] Castruccio, S.; Genton, M. G., Compressing an ensemble with statistical models: An algorithm for global 3D spatio-temporal temperature, Technometrics, 58, 3, 319-328 (2016)
[5] Castruccio, S.; Genton, M. G., Principles for inference on big spatio-temporal data from climate models, Statist. Probab. Lett., 136, 92-96 (2018) · Zbl 1463.62340
[6] Castruccio, S.; Guinness, J., An evolutionary spectrum approach to incorporate large-scale geographical descriptors on global processes, J. R. Stat. Soc. Ser. C. Appl. Stat., 66, 2, 329-344 (2017)
[7] Castruccio, S.; Huser, R.; Genton, M., High-order composite likelihood inference for max-stable distributions and processes, J. Comput. Graph. Statist., 25, 4, 1212-1229 (2016)
[8] Castruccio, S.; Ombao, H.; Genton, M. G., A scalable multi-resolution spatio-temporal model for brain activation and connectivity in fMRI data, Biometrics, 74, 823-833 (2018) · Zbl 1414.62435
[9] Castruccio, S.; Stein, M. L., Global space-time models for climate ensembles, Ann. Appl. Stat., 7, 3, 1593-1611 (2013) · Zbl 1454.62436
[10] Cressie, N.; Johannesson, G., Fixed rank kriging for very large spatial data sets, J. R. Stat. Soc. Ser. B Stat. Methodol., 70, 1, 209-226 (2008) · Zbl 05563351
[11] Datta, A.; Banerjee, S.; Finley, A. O.; Gelfand, A. E., Hierarchical nearest-neighbor Gaussian process models for large geostatistical datasets, J. Amer. Statist. Assoc., 111, 514, 800-812 (2016)
[12] Datta, A.; Banerjee, S.; Finley, A. O.; Hamm, N. A.; Schaap, M., Nonseparable dynamic nearest neighbor Gaussian process models for large spatio-temporal data with an application to particulate matter analysis, Ann. Appl. Stat., 10, 3, 1286-1316 (2016) · Zbl 1391.62269
[13] Davis, P. J., Circulant Matrices (1979), Wiley: Wiley New York, NY · Zbl 0418.15017
[14] Giltinan, D.; Davidian, M., Nonlinear Models for Repeated Measurement Data (1995), Chapman and Hall/CRC Press: Chapman and Hall/CRC Press Boca Raton, FL
[15] Golub, G. H.; Van Loan, C. F., Matrix Computations (2012), Johns Hopkins University Press: Johns Hopkins University Press Baltimore, MD
[16] Gu, M.; Berger, J. O., Parallel partial Gaussian process emulation for computer models with massive output, Ann. Appl. Stat., 10, 3, 1317-1347 (2016) · Zbl 1391.62184
[17] Hamilton, J. D., Time Series Analysis (1994), Princeton University Press: Princeton University Press Princeton, NJ · Zbl 0831.62061
[18] Jeong, J.; Jun, M.; Genton, M. G., Spherical process models for global spatial statistics, Statist. Sci., 32, 4, 501-513 (2017) · Zbl 1381.62091
[19] Johannesson, G.; Cressie, N.; Huang, H.-C., Dynamic multi-resolution spatial models, Environ. Ecol. Stat., 14, 1, 5-25 (2007)
[20] Jones, R. H., Stochastic processes on a sphere, Ann. Math. Stat., 34, 1, 213-218 (1963) · Zbl 0202.46702
[21] Jun, M.; Stein, M. L., An approach to producing space-time covariance functions on spheres, Technometrics, 49, 4, 468-479 (2007)
[22] Jun, M.; Stein, M. L., Nonstationary covariance models for global data, Ann. Appl. Stat., 2, 4, 1271-1289 (2008) · Zbl 1168.62381
[23] Katzfuss, M., A multi-resolution approximation for massive spatial datasets, J. Amer. Statist. Assoc., 112, 517, 201-214 (2017)
[24] Kaufman, C. G.; Schervish, M. J.; Nychka, D., Covariance tapering for likelihood-based estimation in large spatial data sets, J. Amer. Statist. Assoc., 103, 484, 1545-1555 (2008) · Zbl 1286.62072
[25] Kay, J.; Deser, C.; Phillips, A.; Mai, A.; Hannay, C.; Strand, G.; Arblaster, J.; Bates, S. C.; Danabasoglu, G.; Edwards, J., The Community Earth System Model (CESM) large ensemble project: A community resource for studying climate change in the presence of internal climate variability, Bull. Amer. Meteorol. Soc., 96, 8, 1333-1349 (2015)
[26] Lindgren, F.; Rue, H.; Lindström, J., An explicit link between Gaussian fields and Gaussian Markov random fields: the stochastic partial differential equation approach, J. R. Stat. Soc. Ser. B Stat. Methodol., 73, 4, 423-498 (2011) · Zbl 1274.62360
[27] Lütkepohl, H., New Introduction to Multiple Time Series Analysis (2005), Springer: Springer Berlin, Germany · Zbl 1072.62075
[28] Nelder, J. A.; Mead, R., A simplex method for function minimization, Comput. J., 7, 4, 308-313 (1965) · Zbl 0229.65053
[29] Nychka, D.; Bandyopadhyay, S.; Hammerling, D.; Lindgren, F.; Sain, S., A multiresolution Gaussian process model for the analysis of large spatial datasets, J. Comput. Graph. Statist., 24, 2, 579-599 (2015)
[30] Pauli, F.; Racugno, W.; Ventura, L., Bayesian composite marginal likelihoods, Statist. Sinica, 21, 1, 149-164 (2011) · Zbl 1206.62039
[31] Pawitan, Y., In All Likelihood: Statistical Modelling and Inference Using Likelihood (2001), Oxford University Press: Oxford University Press Oxford, UK · Zbl 1013.62001
[32] Poppick, A.; Stein, M. L., Using covariates to model dependence in nonstationary, high-frequency meteorological processes, Environmetrics, 25, 5, 293-305 (2014) · Zbl 1525.62200
[33] Rougier, J., Efficient emulators for multivariate deterministic functions, J. Comput. Graph. Statist., 17, 4, 827-843 (2008)
[34] Schabenberger, O.; Gotway, C. A., Statistical Methods for Spatial Data Analysis (2017), Chapman and Hall/CRC Press: Chapman and Hall/CRC Press Boca Raton, FL
[35] Schweppe, F. C., Evaluation of likelihood functions for Gaussian signals, IEEE Trans. Inform. Theory, 11, 1, 61-70 (1965) · Zbl 0127.10805
[36] Spall, J. C., Effect of imprecisely known nuisance parameters on estimates of primary parameters, Comm. Statist. Theory Methods, 18, 1, 219-237 (1989) · Zbl 0696.62142
[37] Stein, M. L., Interpolation of Spatial Data: Some Theory for Kriging (1999), Springer: Springer New York, NY · Zbl 0924.62100
[38] Van Vuuren, D. P.; Edmonds, J.; Kainuma, M.; Riahi, K.; Thomson, A.; Hibbard, K.; Hurtt, G. C.; Kram, T.; Krey, V.; Lamarque, J.-F., The representative concentration pathways: an overview, Clim. Change, 109, 1-2, 5 (2011)
[39] Vavasis, S. A., Nonlinear Optimization: Complexity Issues (1991), Oxford University Press: Oxford University Press Oxford, UK · Zbl 0785.90091
[40] Whittle, P., On stationary processes in the plane, Biometrika, 41, 3-4, 434-449 (1954) · Zbl 0058.35601
[41] Yudin, D. B.; Nemirovskiĭ, A. S., Evaluation of the informational complexity of mathematical programming problems, Matekon, 13, 3, 25-45 (1977)
[42] Yudin, D. B.; Nemirovskiĭ, A. S., Problem Complexity and Method Efficiency in Optimization (1983), Wiley: Wiley New York, NY · Zbl 0501.90062
[43] Zhang, B.; Sang, H.; Huang, J. Z., Full-scale approximations of spatio-temporal covariance models for large datasets, Statist. Sinica, 25, 1, 99-114 (2015) · Zbl 1482.62022
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.