×

Unsupervised learning on U.S. weather forecast performance. (English) Zbl 07734156

Summary: Nowadays, climate events and weather predictions have a huge impact on human activities. To understand the accuracy of weather prediction, we applied the functional principal component analysis (FPCA) method to investigate the main pattern of variance within the U.S. weather prediction error over a period of 3 years. We further grouped the states in the U.S. based on their similarity in weather forecast performance using two types of functional clustering approaches: the filtering method and the model-based method. The strengths and weaknesses of each clustering method were detected through the simulation studies. Then, the clustering approaches were applied to U.S. weather data from 2014 to 2017. Through clustering, cluster-specific patterns were visually detected, and the cluster-to-cluster differences were quantified in order to identify the most and least predictable U.S. states.

MSC:

62-08 Computational methods for problems pertaining to statistics
Full Text: DOI

References:

[1] Abraham, C.; Cornillon, PA; Matzner-Løber, E.; Molinari, N., Unsupervised curve clustering using b-splines, Scandinavian J stat, 30, 3, 581-595 (2003) · Zbl 1039.91067 · doi:10.1111/1467-9469.00350
[2] Adams, RA; Fournier, JJ, Sobolev spaces (2003), Atlanta: Elsevier, Atlanta · Zbl 1098.46001
[3] Adams, RM; Rosenzweig, C.; Peart, RM; Ritchie, JT; McCarl, BA; Glyer, JD; Curry, RB; Jones, JW; Boote, KJ; Allen, LH Jr, Global climate change and us agriculture, Nature, 345, 6272, 219-224 (1990) · doi:10.1038/345219a0
[4] Adelfio, G.; Chiodi, M.; D’Alessandro, A.; Luzio, D., FPCA algorithm for waveform clustering, J Commun Comput, 8, 6, 494-502 (2011)
[5] Bauer, P.; Thorpe, A.; Brunet, G., The quiet revolution of numerical weather prediction, Nature, 525, 7567, 47-55 (2015) · doi:10.1038/nature14956
[6] Besse, PC; Cardot, H.; Stephenson, DB, Autoregressive forecasting of some functional climatic variations, Scandinavian J Stat, 27, 4, 673-687 (2000) · Zbl 0962.62089 · doi:10.1111/1467-9469.00215
[7] Biernacki, C.; Celeux, G.; Govaert, G., Assessing a mixture model for clustering with the integrated completed likelihood, IEEE Transact Pattern Anal Mach Intell, 22, 7, 719-725 (2000) · doi:10.1109/34.865189
[8] Bosq, D., Nonparametric statistics for stochastic processes: estimation and prediction (1996), New York: Springer-Verlag, New York · Zbl 0857.62081
[9] Bouveyron C (2015) funFEM: Clustering in the Discriminative Functional Subspace. https://CRAN.R-project.org/package=funFEM, r package version 1.1
[10] Bouveyron, C.; Côme, E.; Jacques, J., The discriminative functional mixture model for a comparative analysis of bike sharing systems, The Annals Appl Stat, 9, 4, 1726-1760 (2015) · Zbl 1397.62511 · doi:10.1214/15-AOAS861
[11] Box, GE; Jenkins, GM; Reinsel, GC; Ljung, GM, Time series analysis: forecasting and control (2015), Hoboken, New Jersey: John Wiley & Sons, Hoboken, New Jersey · Zbl 1317.62001
[12] Charrad M, Ghazzali N, Boiteau V, Niknafs A (2012) NbClust package: finding the relevant number of clusters in a dataset. UseR! 2012
[13] Charrad, M.; Ghazzali, N.; Boiteau, V.; Niknafs, A., NbClust: an R package for determining the relevant number of clusters in a data set, J Stat Soft, 61, 6, 1-36 (2014) · doi:10.18637/jss.v061.i06
[14] Collomb G (1983) From non parametric regression to non parametric prediction: Survey of the mean square error and original results on the predictogram. In: Specifying statistical models, Springer, pp 182-204 · Zbl 0538.62033
[15] Curry, HB; Schoenberg, IJ, On Pólya frequency functions IV: the fundamental spline functions and their limits, J d’analyse mathématique, 17, 1, 71-107 (1966) · Zbl 0146.08404 · doi:10.1007/BF02788653
[16] Dudoit, S.; Fridlyand, J., A prediction-based resampling method for estimating the number of clusters in a dataset, Genome Biol, 3, 7, 1-21 (2002) · doi:10.1186/gb-2002-3-7-research0036
[17] Györfi, L.; Härdle, W.; Sarda, P.; Vieu, P., Nonparametric curve estimation from time series (1989), New York: Springer-Verlag, New York · Zbl 0697.62038
[18] Hartigan, JA; Wong, MA, Algorithm as 136: A \(k\)-means clustering algorithm, J Royal Stat Soc Series C (Appl Stat), 28, 1, 100-108 (1979) · Zbl 0447.62062
[19] Hornik K (2019) clue: Cluster ensembles. https://CRAN.R-project.org/package=clue, r package version 0.3-57
[20] Jacques, J.; Preda, C., Functional data clustering: a survey, Adv Data Anal Classificat, 8, 3, 231-255 (2014) · Zbl 1414.62018 · doi:10.1007/s11634-013-0158-y
[21] James, GM; Sugar, CA, Clustering for sparsely sampled functional data, J Am Stat Associat, 98, 462, 397-408 (2003) · Zbl 1041.62052 · doi:10.1198/016214503000189
[22] Ke, Y.; Li, J.; Zhang, W., Structure identification in panel data analysis, The Annals Stat, 44, 3, 1193-1233 (2016) · Zbl 1341.62214 · doi:10.1214/15-AOS1403
[23] Lazo, JK; Morss, RE; Demuth, JL, 300 billion served: Sources, perceptions, uses, and values of weather forecasts, Bullet Am Meteorol Soc, 90, 6, 785-798 (2009) · doi:10.1175/2008BAMS2604.1
[24] Li J, Yue M, Zhang W (2019) Subgroup identification via homogeneity pursuit for dense longitudinal/spatial data. Stat Med
[25] Orrell, D.; Smith, L.; Barkmeijer, J.; Palmer, T., Model error in weather forecasting, Nonlinear Process Geophys, 8, 6, 357-371 (2001) · doi:10.5194/npg-8-357-2001
[26] Papadimitrou, CH; Steiglitz, K., Combinatorial optimization: algorithms and complexity (1982), New York: Prentice-Hall, New York · Zbl 0503.90060
[27] Radhika, Y.; Shashi, M., Atmospheric temperature prediction using support vector machines, Int J Comput Theory Eng, 1, 1, 55-59 (2009) · doi:10.7763/IJCTE.2009.V1.9
[28] Ramsay, J.; Silverman, B., Functional data anal (2005), New York: Springer, New York · Zbl 1079.62006 · doi:10.1007/b98888
[29] Ramsay, J.; Hooker, G.; Graves, S., Functional data analysis with R and MATLAB (2009), New York: Springer, New York · Zbl 1179.62006 · doi:10.1007/978-0-387-98185-7
[30] Ramsay JO, Wickham H, Graves S, Hooker G (2018) fda: Functional Data Analysis. https://CRAN.R-project.org/package=fda, r package version 2.4.8
[31] Rice, JA; Silverman, BW, Estimating the mean and covariance structure nonparametrically when the data are curves, J Royal Stat Soc: Series B (Methodol), 53, 1, 233-243 (1991) · Zbl 0800.62214
[32] Schmutz A, Jacques J, Bouveyron C, Cheze L, Martin P (2018) Clustering multivariate functional data in group-specific functional subspaces, https://hal.inria.fr/hal-01652467, preprint · Zbl 1505.62360
[33] Schwarz, G., Estimating the dimension of a model, The Annals Stat, 6, 2, 461-464 (1978) · Zbl 0379.62005 · doi:10.1214/aos/1176344136
[34] Silverman, BW, Smoothed functional principal components analysis by choice of norm, The Annals Stat, 24, 1, 1-24 (1996) · Zbl 0853.62044 · doi:10.1214/aos/1033066196
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.