×

Functional data clustering: a survey. (English) Zbl 1414.62018

Summary: Clustering techniques for functional data are reviewed. Four groups of clustering algorithms for functional data are proposed. The first group consists of methods working directly on the evaluation points of the curves. The second groups is defined by filtering methods which first approximate the curves into a finite basis of functions and second perform clustering using the basis expansion coefficients. The third groups is composed of methods which perform simultaneously dimensionality reduction of the curves and clustering, leading to functional representation of data depending on clusters. The last group consists of distance-based methods using clustering algorithms based on specific distances for functional data. A software review as well as an illustration of the application of these algorithms on real data are presented.

MSC:

62-07 Data analysis (statistics) (MSC2010)
62M99 Inference from stochastic processes
62H30 Classification and discrimination; cluster analysis (statistical aspects)

References:

[1] Abraham C, Cornillon PA, Matzner-Løber E, Molinari N (2003) Unsupervised curve clustering using B-splines. Scand J Stat Theory Appl 30(3):581-595. doi:10.1111/1467-9469.00350 · Zbl 1039.91067
[2] Akaike H (1974) A new look at the statistical model identification. IEEE Trans Autom Control 19:716-723 (system identification and time-series analysis) · Zbl 0314.62039
[3] Antoniadis A, Beder JH (1989) Joint estimation of the mean and the covariance of a Banach valued Gaussian vector. Statistics 20(1):77-93 · Zbl 0684.62065 · doi:10.1080/02331888908802145
[4] Banfield J, Raftery A (1993) Model-based Gaussian and non-Gaussian clustering. Biometrics 49:803-821 · Zbl 0794.62034 · doi:10.2307/2532201
[5] Bergé L, Bouveyron C, Girard S (2012) HDclassif : an R package for model-based clustering and discriminant analysis of high-dimensional data. J Stat Softw 42(6):1-29
[6] Besse P (1979) Etude descriptive d’un processus. Thèse de doctorat \[3^{\grave{{\rm e}}{\rm me}}3\] e“ <mi mathvariant=”normal”>me cycle Université Paul Sabatier, Toulouse
[7] Biernacki C, Celeux G, Govaert G (2000) Assessing a mixture model for clustering with the inegrated completed likelihood. IEEE Trans Pattern Anal Mach Intell 22(4):719-725 · doi:10.1109/34.865189
[8] Bosq D (2000) Linear processes in function spaces, Lecture Notes in Statistics, vol 149. Springer, New York (theory and applications) · Zbl 0962.60004
[9] Boullé M (2012) Functional data clustering via piecewise constant nonparametric density estimation. Pattern Recognit 45(12):4389-4401 · Zbl 1248.68398 · doi:10.1016/j.patcog.2012.05.016
[10] Boumaza R (1980) Contribution a l’étude descriptive d’une fonction aléatoire qualitative. PhD thesis, Université Paul Sabatier, Toulouse, France
[11] Bouveyron C, Brunet C (2013) Model-based clustering of high-dimensional data : a review. Technical report · Zbl 1471.62032
[12] Bouveyron C, Jacques J (2011) Model-based clustering of time series in group-specific functional subspaces. Adv Data Anal Classif 5(4):281-300 · Zbl 1274.62416 · doi:10.1007/s11634-011-0095-6
[13] Bouveyron C, Girard S, Schmid C (2007) High dimensional data clustering. Comput Stat Data Anal 52: 502-519 · Zbl 1452.62433
[14] Cardot H, Ferraty F, Sarda P (1999) Functional linear model. Stat Probab Lett 45:11-22 · Zbl 0962.62081 · doi:10.1016/S0167-7152(99)00036-X
[15] Cattell R (1966) The scree test for the number of factors. Multivar Behav Res 1(2):245-276 · doi:10.1207/s15327906mbr0102_10
[16] Celeux G, Govaert G (1995) Gaussian parsimonious clustering models. J Pattern Recognit Soc 28:781-793 · doi:10.1016/0031-3203(94)00125-6
[17] Chiou JM, Li PL (2007) Functional clustering and identifying substructures of longitudinal data. J R Stat Soc Ser B Stat Methodol 69(4):679-699. doi:10.1111/j.1467-9868.2007.00605.x · Zbl 07555371 · doi:10.1111/j.1467-9868.2007.00605.x
[18] Coifman R, Wickerhauser M (1992) Entropy-based algorithms for best basis selection. IEEE Trans Inf Theory 38(2):713-718 · Zbl 0849.94005 · doi:10.1109/18.119732
[19] Cox T, Cox M (2001) Multidimensional scaling. Chapman and Hall, New York · Zbl 1004.91067
[20] Cuesta-Albertos J, Fraiman R (2000) Impartial trimmed k-means for functional data. Comput Stat Data Anal 51:4864-4877 · Zbl 1162.62377 · doi:10.1016/j.csda.2006.07.011
[21] Dauxois J, Pousse A, Romain Y (1982) Asymptotic theory for the principal component analysis of a vector random function: some applications to statistical inference. J Multivar Anal 12(1):136-154. doi:10.1016/0047-259X(82)90088-4 · Zbl 0539.62064 · doi:10.1016/0047-259X(82)90088-4
[22] Delaigle A, Hall P (2010) Defining probability density for a distribution of random functions. Ann Stat 38:1171-1193 · Zbl 1183.62061 · doi:10.1214/09-AOS741
[23] Deville J (1974) Méthodes statistiques et numériques de l’analyse harmonique. Annales de l’INSEE 15:3-101
[24] Escabias M, Aguilera A, Valderrama M (2005) Modeling environmental data by functional principal component logistic regression. Environmetrics 16:95-107 · doi:10.1002/env.696
[25] Ferraty F, Vieu P (2006) Nonparametric functional data analysis, Springer Series in Statistics. Springer, New York · Zbl 1119.62046
[26] Gaffney S (2004) Probabilistic curve-aligned clustering and prediction with mixture models. PhD thesis, Department of Computer Science, University of California, Irvine, USA
[27] Giacofci M, Lambert-Lacroix S, Marot G, Picard F (2012) Wavelet-based clustering for mixed-effects functional models in high dimension. Biometrics (in press) · Zbl 1274.62774
[28] Guyon I, Von Luxburg U, Williamson R (2009) Clustering: science or art. In: NIPS 2009 workshop on clustering theory
[29] Hartigan J, Wong M (1978) Algorithm as 1326: a k-means clustering algorithm. Appl Stat 28:100-108 · Zbl 0447.62062 · doi:10.2307/2346830
[30] Heard N, Holmes C, Stephens D (2006) A quantitative study of gene regulation involved in the immune response of anopheline mosquitoes: an application of Bayesian hierarchical clustering of curves. J Am Stat Assoc 101(473):18-29. doi:10.1198/016214505000000187 · Zbl 1118.62368 · doi:10.1198/016214505000000187
[31] Hébrail G, Hugueney B, Lechevallier Y, Rossi F (2010) Exploratory analysis of functional data via clustering and optimal segmentation. Neurocomput EEG Neurocomput 73(7-9):1125-1141 · doi:10.1016/j.neucom.2009.11.022
[32] Ieva F, Paganoni A, Pigoli D, Vitelli V (2012) Multivariate functional clustering for the analysis of ecg curves morphology. J R Stat Soc Ser C Appl Stat (in press)
[33] Jacques J, Preda C (2013a) Funclust: a curves clustering method using functional random variable density approximation. Neurocomputing. doi:10.1016/j.neucom.2012.11.042
[34] Jacques J, Preda C (2013b) Model-based clustering for multivariate functional data. Comput Stat Data Anal. doi:10.1016/j.csda.2012.12.004 · Zbl 1471.62096
[35] James G, Sugar C (2003) Clustering for sparsely sampled functional data. J Am Stat Assoc 98(462):397-408 · Zbl 1041.62052 · doi:10.1198/016214503000189
[36] Karhunen K (1947) Über lineare Methoden in der Wahrscheinlichkeitsrechnung. Ann Acad Sci Fennicae Ser A I Math-Phys 1947(37):79 · Zbl 0030.16502
[37] Kayano M, Dozono K, Konishi S (2010) Functional cluster analysis via orthonormalized gaussian basis expansions and its application. J Classif 27:211-230 · Zbl 1337.62134 · doi:10.1007/s00357-010-9054-8
[38] Kohonen T (1995) Self-organizing maps. Springer, New York · doi:10.1007/978-3-642-97610-0
[39] Lévéder C, Abraham P, Cornillon E, Matzner-Lober E, Molinari N (2004) Discrimination de courbes de prétrissage. In: Chimiométrie 2004, Paris, pp 37-43
[40] Liu X, Yang M (2009) Simultaneous curve registration and clustering for functional data. Comput Stat Data Anal 53:1361-1376 · Zbl 1452.62993 · doi:10.1016/j.csda.2008.11.019
[41] Loève M (1945) Fonctions aléatoires de second ordre. C R Acad Sci Paris 220:469 · Zbl 0063.03612
[42] MATLAB (2010) version 7.10.0 (R2010a) The MathWorks Inc., Natick, Massachusetts · Zbl 1200.93001
[43] McLachlan G, Peel D (2000) Finite mixture models. Wiley Series in Probability and Statistics. Applied Probability and Statistics, Wiley-Interscience, New York. doi:10.1002/0471721182 · Zbl 0963.62061
[44] Olszewski R (2001) Generalized feature extraction for structural pattern recognition in time-series data. PhD thesis, Carnegie Mellon University, Pittsburgh, PA
[45] Peng J, Müller HG (2008) Distance-based clustering of sparsely observed stochastic processes, with applications to online auctions. Ann Appl Stat 2(3):1056-1077. doi:10.1214/08-AOAS172 · Zbl 1149.62053 · doi:10.1214/08-AOAS172
[46] Preda C, Saporta G, Lévéder C (2007) PLS classification of functional data. Comput Stat 22(2):223-235 · Zbl 1196.62086 · doi:10.1007/s00180-007-0041-4
[47] R Core Team (2012) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. http://www.R-project.org/. ISBN: 3-900051-07-0
[48] Ramsay JO, Silverman BW (2002) Applied functional data analysis. Springer Series in Statistics. Springer, New York (methods and case studies) · Zbl 1011.62002
[49] Ramsay JO, Silverman BW (2005) Functional data analysis, 2nd edn. Springer Series in Statistics. Springer, New York · Zbl 1079.62006
[50] Ray S, Mallick B (2006) Functional clustering by Bayesian wavelet methods. J R Stat Soc Ser B Stat Methodol 68(2):305-332. doi:10.1111/j.1467-9868.2006.00545.x · Zbl 1100.62058 · doi:10.1111/j.1467-9868.2006.00545.x
[51] Romano E, Giraldo R, Mateu J (2011) Recent advances in functional data analysis and related topics, Springer, chap clustering spatially correlated functional data
[52] Rossi F, Conan-Guez B, El Golli A (2004) Clustering functional data with the som algorithm. In: Proceedings of ESANN 2004. Bruges, Belgium, pp 305-312
[53] Saito N, Coifman R (1995) Local discriminant bases and thier applications. J Math Imaging Vis 5(4):337-358 · Zbl 0863.94004 · doi:10.1007/BF01250288
[54] Samé A, Chamroukhi F, Govaert G, Aknin P (2011) Model-based clustering and segmentation of times series with changes in regime. Adv Data Anal Classif 5(4):301-322 · Zbl 1274.62427 · doi:10.1007/s11634-011-0096-5
[55] Sangalli L, Secchi P, Vantini S, Vitelli V (2010a) Functional clustering and alignment methods with applications. Commun App Ind Math 1(1):205-224 · Zbl 1329.62289
[56] Sangalli L, Secchi P, Vantini S, Vitelli V (2010b) \[k\] k-mean alignment for curve clustering. Comput Stat Data Anal 54(5):1219-1233 · Zbl 1464.62153 · doi:10.1016/j.csda.2009.12.008
[57] Saporta G (1981) Méthodes exploratoires d’analyse de données temporelles. Cahiers du BURO 37-38
[58] Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6(2):461-464 · Zbl 0379.62005 · doi:10.1214/aos/1176344136
[59] Secchi P, Vantini S, Vitelli V (2011) Recent advances in functional data analysis and related topics, Springer, chap Spatial Clustering of Functional Data
[60] Serban N, Jiang H (2012) Multilevel functional clustering analysis. Biometrics 68(3):805-814 · Zbl 1272.62085 · doi:10.1111/j.1541-0420.2011.01714.x
[61] Slaets L, Claeskens G, Hubert M (2012) Phase and amplitude-based clustering for functional data. Comput Stat Data Anal 56(7):2360-2374 · Zbl 1252.62066 · doi:10.1016/j.csda.2012.01.017
[62] Sugar C, James G (2003) Finding the number of clusters in a dataset: an information-theoretic approach. J Am Stat Assoc 98(463):750-763 · Zbl 1046.62064
[63] Tarpey T, Kinateder K (2003) Clustering functional data. J Classif 20(1):93-114 · Zbl 1112.62327 · doi:10.1007/s00357-003-0007-3
[64] Tipping ME, Bishop C (1999) Mixtures of principal component analyzers. Neural Comput 11(2):443-482 · doi:10.1162/089976699300016728
[65] Tokushige S, Yadohisa H, Inada K (2007) Crisp and fuzzy k-means clustering algorithms for multivariate functional data. Comput Stat 22:1-16 · Zbl 1196.62089 · doi:10.1007/s00180-006-0013-0
[66] Tuddenham R, Snyder M (1954) Physical growth of california boys and girls from birth to eighteen years. Univ Calif Public Child Dev 1:188-364
[67] Wahba G (1990) Spline models for observational data. SIAM, Philadelphia · Zbl 0813.62001 · doi:10.1137/1.9781611970128
[68] Ward J, Joe H (1963) Hierarchical grouping to optimize an objective function. J Am Stat Assoc 58:236-244 · doi:10.1080/01621459.1963.10500845
[69] Yamamoto M (2012) Clustering of functional data in a low-dimensional subspace. Adv Data Anal Classif 6:219-247 · Zbl 1254.62077 · doi:10.1007/s11634-012-0113-3
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.