×

Multivariate functional data modeling with time-varying clustering. (English) Zbl 1479.62096

Summary: We consider the setting of multivariate functional data collected over time at each of a set of sites. Our objective is to implement model-based clustering of the functions across the sites where we allow such clustering to vary over time. Anticipating dependence between the functions within a site as well as across sites, we model the collection of functions using a multivariate Gaussian process. With many sites and several functions at each site, we use dimension reduction to provide a computationally manageable stochastic process specification. To jointly cluster the functions, we use the Dirichlet process which enables shared labeling of the functions across the sites. Specifically, we cluster functions based on their response to exogenous variables. Though the functions arise over continuous time, clustering in continuous time is extremely computationally demanding and not of practical interest. Therefore, we employ partitioning of the timescale to capture time-varying clustering. Our illustrative setting is bivariate, monitoring ozone and \(\mathrm{PM_{10}}\) levels over time for one year at a set of monitoring sites. The data we work with is from 24 monitoring sites in Mexico City for 2017 which record hourly ozone and \(\mathrm{PM_{10}}\) levels. Hence, we have 48 functions to work with across 8760 hours. We provide a Gaussian process model for each function using continuous-time meteorological variables as regressors along with adjustment for daily periodicity. We interpret the similarity of functions in terms of their shape, captured through site-specific coefficients, and use these coefficients to develop the clustering.

MSC:

62R10 Functional data analysis
62H30 Classification and discrimination; cluster analysis (statistical aspects)
62P12 Applications of statistics to environmental and related topics
60G15 Gaussian processes
60J25 Continuous-time Markov processes on general state spaces
86A10 Meteorology and atmospheric physics

Software:

spBayes; fda (R); AS 136

References:

[1] Abraham, C.; Cornillon, P-A; Matzner-Løber, E.; Molinari, N., Unsupervised curve clustering using b-splines, Scand J Stat, 30, 3, 581-595 (2003) · Zbl 1039.91067 · doi:10.1111/1467-9469.00350
[2] Aguilar, O.; West, M., Bayesian dynamic factor models and portfolio allocation, J Bus Econ Stat, 18, 3, 338-357 (2000)
[3] Ali, AM; Darvishzadeh, R.; Skidmore, AK, Retrieval of specific leaf area from landsat-8 surface reflectance data using statistical and physical models, IEEE J Sel Top Appl Earth Observ Remote Sens, 10, 8, 3529-3536 (2017) · doi:10.1109/JSTARS.2017.2690623
[4] Banerjee, S.; Carlin, BP; Gelfand, AE, Hierarchical modeling and analysis for spatial data (2014), Amsterdam: CRC Press, Amsterdam · doi:10.1201/b17115
[5] Banerjee, S.; Gelfand, AE; Finley, AO; Sang, H., Gaussian predictive process models for large spatial data sets, J R Stat Soc Ser B (Stat Methodol), 70, 4, 825-848 (2008) · Zbl 1533.62065 · doi:10.1111/j.1467-9868.2008.00663.x
[6] Bernardo, J.; Bayarri, M.; Berger, J.; Dawid, A.; Heckerman, D.; Smith, A.; West, M., Bayesian factor regression models in the “large p, small n” paradigm, Bayesian Stat, 7, 733-742 (2003)
[7] Berrocal, VJ; Gelfand, AE; Holland, DM, A spatio-temporal downscaler for output from numerical models, J AgriC Biol Environ Stat, 15, 2, 176-197 (2010) · Zbl 1306.62243 · doi:10.1007/s13253-009-0004-z
[8] Bhattacharya A, Dunson D. B (2011) Sparse Bayesian infinite factor models. Biometrika, 291-306 · Zbl 1215.62025
[9] Brockwell, PJ; Davis, R.; Yang, Y., Continuous-time Gaussian autoregression, Stat Sin, 17, 63-80 (2007) · Zbl 1145.62070
[10] Christensen, WF; Amemiya, Y., Latent variable analysis of multivariate spatial data, J Am Stat Assoc, 97, 457, 302-317 (2002) · Zbl 1073.62537 · doi:10.1198/016214502753479437
[11] Cocchi, D.; Greco, F.; Trivisano, C., Hierarchical space-time modelling of pm10 pollution, Atmos Environ, 41, 3, 532-542 (2007) · doi:10.1016/j.atmosenv.2006.08.032
[12] Datta, A.; Banerjee, S.; Finley, AO; Gelfand, AE, Hierarchical nearest-neighbor Gaussian process models for large geostatistical datasets, J Am Stat Assoc, 111, 514, 800-812 (2016) · doi:10.1080/01621459.2015.1044091
[13] Escobar, MD; West, M., Bayesian density estimation and inference using mixtures, J Am Stat Assoc, 90, 430, 577-588 (1995) · Zbl 0826.62021 · doi:10.1080/01621459.1995.10476550
[14] Ferguson TS (1973) A Bayesian analysis of some nonparametric problems. Ann Stat 209-230 · Zbl 0255.62037
[15] Gelfand, AE; Kim, H-J; Sirmans, C.; Banerjee, S., Spatial modeling with spatially varying coefficient processes, J Am Stat Assoc, 98, 462, 387-396 (2003) · Zbl 1041.62041 · doi:10.1198/016214503000170
[16] Gervini, D., Warped functional regression, Biometrika, 102, 1, 1-14 (2014) · Zbl 1345.62014 · doi:10.1093/biomet/asu054
[17] Geweke, J.; Zhou, G., Measuring the pricing error of the arbitrage pricing theory, Rev Financ Stud, 9, 2, 557-587 (1996) · doi:10.1093/rfs/9.2.557
[18] Han, S.; Kerekes, J.; Higbee, S.; Siegel, L.; Pertica, A., Band selection method for subpixel target detection using only the target reflectance signature, Appl Opt, 58, 11, 2981-2993 (2019) · doi:10.1364/AO.58.002981
[19] Hartigan, JA; Wong, MA, Algorithm as 136: a k-means clustering algorithm, J R Stat Soc Ser C (Appl Stat), 28, 1, 100-108 (1979) · Zbl 0447.62062
[20] Hogan, JW; Tchernis, R., Bayesian factor analysis for spatially correlated data, with application to summarizing area-level material deprivation from census data, J Am Stat Assoc, 99, 466, 314-324 (2004) · Zbl 1117.62354 · doi:10.1198/016214504000000296
[21] Huang, G.; Lee, D.; Scott, EM, Multivariate space-time modelling of multiple air pollutants and their health effects accounting for exposure uncertainty, Stat Med, 37, 7, 1134-1148 (2018) · doi:10.1002/sim.7570
[22] Jacques, J.; Preda, C., Model-based clustering for multivariate functional data, Comput Stat Data Anal, 71, 92-106 (2014) · Zbl 1471.62096 · doi:10.1016/j.csda.2012.12.004
[23] Lopes HF, West M (2004) Bayesian model assessment in factor analysis. Stat Sinica, 41-67 · Zbl 1035.62060
[24] Morris, JS, Functional regression, Annu Rev Stat Appl, 2, 321-359 (2015) · doi:10.1146/annurev-statistics-010814-020413
[25] Petrone, S.; Guindani, M.; Gelfand, AE, Hybrid Dirichlet mixture models for functional data, J R Stat Soc Ser B (Stat Methodol), 71, 4, 755-782 (2009) · Zbl 1248.62079 · doi:10.1111/j.1467-9868.2009.00708.x
[26] Ramsay, J., When the data are functions, Psychometrika, 47, 4, 379-396 (1982) · Zbl 0512.62004 · doi:10.1007/BF02293704
[27] Ramsay, JO; Dalzell, C., Some tools for functional data analysis, J R Stat Soc Ser B (Stat Methodol), 53, 3, 539-561 (1991) · Zbl 0800.62314
[28] Ramsay, JO; Silverman, BW, Applied functional data analysis: methods and case studies (2007), Berlin: Springer, Berlin · Zbl 1011.62002
[29] Sahu, SK; Gelfand, AE; Holland, DM, High-resolution space-time ozone modeling for assessing trends, J Am Stat Assoc, 102, 480, 1221-1234 (2007) · Zbl 1332.86014 · doi:10.1198/016214507000000031
[30] Schmutz, A.; Jacques, J.; Bouveyron, C.; Cheze, L.; Martin, P., Clustering multivariate functional data in group-specific functional subspaces, Comput Stat, 35, 1101-1131 (2020) · Zbl 1505.62360 · doi:10.1007/s00180-020-00958-4
[31] Seber, GA, Multivariate Observ (2009), New York: Wiley, New York
[32] Sethuraman J (1994) A constructive definition of dirichlet priors. Stat Sinica 639-650 · Zbl 0823.62007
[33] Shi, JQ; Choi, T., Gaussian process regression analysis for functional data (2011), Boca Raton: Chapman and Hall/CRC, Boca Raton · Zbl 1273.60005 · doi:10.1201/b11038
[34] Sugar, CA; James, GM, Finding the number of clusters in a dataset: an information-theoretic approach, J Am Stat Assoc, 98, 463, 750-763 (2003) · Zbl 1046.62064 · doi:10.1198/016214503000000666
[35] Telesca, D.; Inoue, LYT, Bayesian hierarchical curve registration, J Am Stat Assoc, 103, 481, 328-339 (2008) · Zbl 1471.62560 · doi:10.1198/016214507000001139
[36] Ullah, S.; Finch, CF, Applications of functional data analysis: a systematic review, BMC Med Res Methodol, 13, 1, 43 (2013) · doi:10.1186/1471-2288-13-43
[37] Wang, B.; Chen, T., Gaussian process regression with multiple response variables, Chemometr Intell Lab Syst, 142, 159-165 (2015) · doi:10.1016/j.chemolab.2015.01.016
[38] Wang, J-L; Chiou, J-M; Müller, H-G, Functional data analysis, Ann Rev Stat Appl, 3, 257-295 (2016) · doi:10.1146/annurev-statistics-041715-033624
[39] West, M.; Harrison, J., Bayesian forecasting and dynamic models (1997), Berlin: Springer, Berlin · Zbl 0871.62026
[40] White, P.; Porcu, E., Nonseparable covariance models on circles cross time: a study of Mexico City ozone, Environmetrics, 30, 5, e2558 (2019) · doi:10.1002/env.2558
[41] White, PA; Gelfand, AE; Rodrigues, ER; Tzintzun, G., Pollution state modelling for Mexico City, J R Stat Soc Ser A (Stat Soc), 182, 3, 1039-1060 (2019) · doi:10.1111/rssa.12444
[42] Zhang, H., Inconsistent estimation and asymptotically equal interpolations in model-based geostatistics, J Ame Stat Assoc, 99, 465, 250-261 (2004) · Zbl 1089.62538 · doi:10.1198/016214504000000241
[43] Zhang, X.; Nott, DJ; Yau, C.; Jasra, A., A sequential algorithm for fast fitting of dirichlet process mixture models, J Comput Gr Stat, 23, 4, 1143-1162 (2014) · doi:10.1080/10618600.2013.870906
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.