×

Model-based clustering for multivariate functional data. (English) Zbl 1471.62096

Summary: The first model-based clustering algorithm for multivariate functional data is proposed. After introducing multivariate functional principal components analysis (MFPCA), a parametric mixture model, based on the assumption of normality of the principal component scores, is defined and estimated by an EM-like algorithm. The main advantage of the proposed model is its ability to take into account the dependence among curves. Results on simulated and real datasets show the efficiency of the proposed method.

MSC:

62-08 Computational methods for problems pertaining to statistics
62H30 Classification and discrimination; cluster analysis (statistical aspects)
62H25 Factor analysis and principal components; correspondence analysis
62R10 Functional data analysis

Software:

fda (R); funHDDC; R

References:

[1] Abraham, C.; Cornillon, P. A.; Matzner-Løber, E.; Molinari, N., Unsupervised curve clustering using \(B\)-splines, Scandinavian Journal of Statistics. Theory and Applications, 30, 3, 581-595, (2003) · Zbl 1039.91067
[2] Berrendero, J.; Justel, A.; Svarc, M., Principal components for multivariate functional data, Computational Statistics and Data Analysis, 55, 2619-2634, (2011) · Zbl 1464.62025
[3] Besse, P., 1979. Etude descriptive d’un processus. Ph.D. Thesis. Université Paul Sabatier, Toulouse.
[4] Biernacki, C., Initializing EM using the properties of its trajectories in Gaussian mixtures, Statistics and Computing, 14, 3, 267-279, (2004)
[5] Bouveyron, C.; Girard, S.; Schmid, C., High dimensional data clustering, Computational Statistics and Data Analysis, 52, 502-519, (2007) · Zbl 1452.62433
[6] Bouveyron, C.; Jacques, J., Model-based clustering of time series in group-specific functional subspaces, Advances in Data Analysis and Classification, 5, 4, 281-300, (2011) · Zbl 1274.62416
[7] Cattell, R., The scree test for the number of factors, Multivariate Behavioral Research, 1, 2, 245-276, (1966)
[8] Chiou, J.-M.; Li, P.-L., Functional clustering and identifying substructures of longitudinal data, Journal of the Royal Statistical Society. Series B. Statistical Methodology, 69, 4, 679-699, (2007) · Zbl 07555371
[9] Delaigle, A.; Hall, P., Defining probability density for a distribution of random functions, The Annals of Statistics, 38, 1171-1193, (2010) · Zbl 1183.62061
[10] Dempster, A. P.; Laird, N. M.; Rubin, D. B., Maximum likelihood from incomplete data via the EM algorithm, Journal of the Royal Statistical Society. Series B. Statistical Methodology, 39, 1, 1-38, (1977) · Zbl 0364.62022
[11] Deville, J., Méthodes statistiques et numériques de l’analyse harmonique, Annales de l’INSEE, 15, 3-101, (1974)
[12] Ferraty, F.; Vieu, P., Curves discrimination: a nonparametric approach, Computational Statistics and Data Analysis, 44, 161-173, (2003) · Zbl 1429.62241
[13] Frühwirth-Schnatter, S.; Kaufmann, S., Model-based clustering of multiple time series, Journal of Business and Economic Statistics, 26, 78-89, (2008)
[14] Guyon, I., Von Luxburg, U., Williamson, R., 2009. Clustering: science or art. In: NIPS 2009 Workshop on Clustering Theory.
[15] Ieva, F.; Paganoni, A. M.; Pigoli, D.; Vitelli, V., Multivariate functional clustering for the morphological analysis of electrocardiograph curves, Journal of the Royal Statistical Society: Series C (Applied Statistics), (2012)
[16] Jacques, J., Preda, C., 2012. Model-based clustering of functional data. In: 20th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning. Bruges. pp. 459-464.
[17] Jacques, J., Preda, C., 2013. Funclust: a curves clustering method using functional random variable density approximation. Neurocomputing (in press).
[18] James, G.; Sugar, C., Clustering for sparsely sampled functional data, Journal of the American Statistical Association, 98, 462, 397-408, (2003) · Zbl 1041.62052
[19] Kayano, M.; Dozono, K.; Konishi, S., Functional cluster analysis via orthonormalized Gaussian basis expansions and its application, Journal of Classification, 27, 211-230, (2010) · Zbl 1337.62134
[20] Leveder, C., Abraham, C., Cornillon, P.A., Matzner-Løber, E., Molinari, N., 2004. Discrimination de courbes de pétrissage. In: Chimiométrie 2004. Paris. pp. 37-43.
[21] Liu, X.; Yang, M., Simultaneous curve registration and clustering for functional data, Computational Statistics and Data Analysis, 53, 1361-1376, (2009) · Zbl 1452.62993
[22] McLachlan, G.; Peel, D., Finite mixture models, (2000), Wiley Interscience New York · Zbl 0963.62061
[23] Olszewski, R., 2001. Generalized feature extraction for structural pattern recognition in time-series data. Ph.D. Thesis. Carnegie Mellon University. Pittsburgh, PA.
[24] Poskitt, D.; Sengarapillai, A., Description length and dimensionality reduction in functional data analysis, Computational Statistics and Data Analysis, 58, 98-113, (2013) · Zbl 1365.62225
[25] Preda, C., Regression models for functional data by reproducing kernel Hilbert spaces methods, Journal of Statistical Planning and Inference, 137, 829-840, (2007) · Zbl 1104.62043
[26] Preda, C.; Saporta, G.; Lévéder, C., PLS classification of functional data, Computational Statistics, 22, 2, 223-235, (2007) · Zbl 1196.62086
[27] Ramsay, J. O.; Silverman, B. W., (Functional Data Analysis, Springer Series in Statistics, (2005), Springer New York) · Zbl 1079.62006
[28] Ray, S.; Mallick, B., Functional clustering by Bayesian wavelet methods, Journal of the Royal Statistical Society. Series B. Statistical Methodology, 68, 2, 305-332, (2006) · Zbl 1100.62058
[29] R Core Team, 2012. R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0. http://www.R-project.org/.
[30] Sangalli, L.; Secchi, P.; Vantini, S.; Vitelli, V., \(K\)-means alignment for curve clustering, Computational Statistics and Data Analysis, 54, 5, 1219-1233, (2010) · Zbl 1464.62153
[31] Saporta, G., (Méthodes Exploratoires d’Analyse de Données Temporelles, Cahiers du Buro, vols. 37-38, (1981))
[32] Schwarz, G., Estimating the dimension of a model, The Annals of Statistics, 6, 2, 461-464, (1978) · Zbl 0379.62005
[33] Singhal, A.; Seborg, D., Clustering multivariate time-series data, Journal of Chemometrics, 19, 427-438, (2005)
[34] Tarpey, T.; Kinateder, K., Clustering functional data, Journal of Classification, 20, 1, 93-114, (2003) · Zbl 1112.62327
[35] Titterington, D. M.; Smith, A. F.M.; Makov, U. E., (Statistical Analysis of Finite Mixture Distributions, Wiley Series in Probability and Mathematical Statistics: Applied Probability and Statistics, (1985), John Wiley & Sons Ltd. Chichester) · Zbl 0646.62013
[36] Tokushige, S.; Yadohisa, H.; Inada, K., Crisp and fuzzy \(k\)-means clustering algorithms for multivariate functional data, Computational Statistics, 22, 1-16, (2007) · Zbl 1196.62089
[37] Tuddenham, R., Snyder, M., 1954. Physical growth of California boys and girls from birth to eighteen years. Universities of California Public Child Development 1, pp. 188-364.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.