×

A \(k\)-means procedure based on a Mahalanobis type distance for clustering multivariate functional data. (English) Zbl 1427.62054

Summary: This paper proposes a clustering procedure for samples of multivariate functions in \((L^2(I))^{J}\), with \(J\ge 1\). This method is based on a \(k\)-means algorithm in which the distance between the curves is measured with a metric that generalizes the Mahalanobis distance in Hilbert spaces, considering the correlation and the variability along all the components of the functional data. The proposed procedure has been studied in simulation and compared with the \(k\)-means based on other distances typically adopted for clustering multivariate functional data. In these simulations, it is shown that the \(k\)-means algorithm with the generalized Mahalanobis distance provides the best clustering performances, both in terms of mean and standard deviation of the number of misclassified curves. Finally, the proposed method has been applied to two case studies, concerning ECG signals and growth curves, where the results obtained in simulation are confirmed and strengthened.

MSC:

62H30 Classification and discrimination; cluster analysis (statistical aspects)
62H20 Measures of association (correlation, canonical correlation, etc.)
62P10 Applications of statistics to biology and medical sciences; meta analysis

References:

[1] Boudaoud S, Rix H, Meste O (2010) Core shape modelling of a set of curves. Comput Stat Data Anal 54:308-325 · Zbl 1464.62032 · doi:10.1016/j.csda.2009.08.003
[2] Bouveyron C (2015) funFEM: clustering in the discriminative functional subspace. R package version 1.1. https://CRAN.R-project.org/package=funFEM. Accessed 26 Nov 2018
[3] Cerioli A (2005) K-means cluster analysis and Mahalanobis metrics: a problematic match or an overlooked opportunity? Stat Appl 17:1
[4] Ferraty F, Vieu P (2006) Nonparametric functional data analysis: theory and practice. Springer series in statistics. Springer, New York · Zbl 1119.62046
[5] Galeano P, Joseph E, Lillo Rosa E (2014) The Mahalanobis distance for functional data with applications to classification. Technometrics 57(2):281-291 · doi:10.1080/00401706.2014.902774
[6] Gattone SA, Rocci R (2012) Clustering curves on a reduced subspace. J Comput Graph Stat 21(2):361-379 · doi:10.1080/10618600.2012.679237
[7] Ghiglietti A, Paganoni AM (2017) Exact tests for the means of gaussian stochastic processes. Stat Prob Lett 131:102-107 · Zbl 1391.62072 · doi:10.1016/j.spl.2017.08.001
[8] Ghiglietti A, Ieva F, Paganoni AM (2017) Statistical inference for stochastic processes: two-sample hypothesis tests. J Stat Plann Inference 180:49-68 · Zbl 1349.62219 · doi:10.1016/j.jspi.2016.08.004
[9] Horváth L, Kokoszka P (2012) Inference for functional data with applications. Springer series in statistics. Springer, New York · Zbl 1279.62017 · doi:10.1007/978-1-4614-3655-3
[10] Ieva F, Paganoni AM, Pigoli D, Vitelli V (2013) Multivariate functional clustering for the morphological analysis of electrocardiograph curves. J R Stat Soc Ser C Appl Stat 62:401-418 · doi:10.1111/j.1467-9876.2012.01062.x
[11] Jacques J, Preda C (2014) Model-based clustering for multivariate functional data. Comput Stat Data Anal 71:92-106 · Zbl 1471.62096 · doi:10.1016/j.csda.2012.12.004
[12] Kaufman L, Rousseeuw PJ (2009) Finding groups in data: an introduction to cluster analysis, vol 344. Wiley, New York · Zbl 1345.62009
[13] Liu X, Müller HG (2003) Modes and clustering for time-warped gene expression profile data. Bioinformatics 19:1937-1944 · doi:10.1093/bioinformatics/btg257
[14] Liu X, Yang M (2009) Simultaneous curve registration and clustering for functional data. Comput Stat Data Anal 53:1361-1376 · Zbl 1452.62993 · doi:10.1016/j.csda.2008.11.019
[15] Martino A, Ghiglietti A, Ieva F, Paganoni AM (2018) gmfd: inference and clustering of functional data. R package version 1.0.1. https://CRAN.R-project.org/package=gmfd. Accessed 26 Nov 2018 · Zbl 1427.62054
[16] Melnykov I, Melnykov V (2014) On K-means algorithm with the use of Mahalanobis distances. Stat Prob Lett 84:88-95 · Zbl 1284.62383 · doi:10.1016/j.spl.2013.09.026
[17] R Core Team (2016) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna
[18] Ramsay J, Silverman BW (2002) Applied functional data analysis – methods and case studies. Springer series in statistics. Springer, New York · Zbl 1011.62002 · doi:10.1007/b98886
[19] Ramsay J, Silverman BW (2005) Functional data analysis, 2nd edn. Springer series in statistics. Springer, New York · Zbl 1079.62006 · doi:10.1007/b98888
[20] Ramsay JO, Wickham H, Graves S, Hooker G (2014) fda: functional data analysis. R package version 2.4.4
[21] Sangalli LM, Secchi P, Vantini S, Vitelli V (2010) k-mean alignment for curve clustering. Comput Stat Data Anal 54:1219-1233 · Zbl 1464.62153 · doi:10.1016/j.csda.2009.12.008
[22] Soueidatt M (2014) Funclustering: a package for functional data clustering. R package version 1.0.1. https://CRAN.R-project.org/package=Funclustering. Accessed 26 Nov 2018
[23] Tarpey T, Kinateder KKK (2003) Clustering functional data. J Classif 20:93-114 · Zbl 1112.62327 · doi:10.1007/s00357-003-0007-3
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.