×

Clustering of longitudinal curves via a penalized method and EM algorithm. (English) Zbl 07876424

Summary: In this article, a new method is proposed for clustering longitudinal curves. In the proposed method, clusters of mean functions are identified through a weighted concave pairwise fusion method. The EM algorithm and the alternating direction method of multipliers algorithm are combined to estimate the group structure, mean functions and principal components simultaneously. The proposed method also allows to incorporate the prior neighborhood information to have more meaningful groups by adding pairwise weights in the pairwise penalties. In the simulation study, the performance of the proposed method is compared to some existing clustering methods in terms of the accuracy for estimating the number of subgroups and mean functions. The results suggest that ignoring the covariance structure will have a great effect on the performance of estimating the number of groups and estimating accuracy. The effect of including pairwise weights is also explored in a spatial lattice setting to take into consideration of the spatial information. The results show that incorporating spatial weights will improve the performance. A real example is used to illustrate the proposed method.

MSC:

62-08 Computational methods for problems pertaining to statistics

References:

[1] Basu S, Banerjee A, Mooney R.J (2004) Active semi-supervision for pairwise constrained clustering. In: Proceedings of the 2004 SIAM international conference on data mining. SIAM, pp. 333-344
[2] Bouveyron, C.; Côme, E.; Jacques, J., The discriminative functional mixture model for a comparative analysis of bike sharing systems, Ann Appl Stat, 9, 4, 1726-1760, 2015 · Zbl 1397.62511
[3] Bouveyron, C.; Jacques, J., Model-based clustering of time series in group-specific functional subspaces, Adv Data Anal Classif, 5, 4, 281-300, 2011 · Zbl 1274.62416
[4] Boyd, S.; Parikh, N.; Chu, E.; Peleato, B.; Eckstein, J., Distributed optimization and statistical learning via the alternating direction method of multipliers, Found Trends Mach Learn, 3, 1, 1-122, 2011 · Zbl 1229.90122
[5] Chi, EC; Lange, K., Splitting methods for convex clustering, J Comput Graph Stat, 24, 4, 994-1013, 2015
[6] Chiou, JM; Li, PL, Functional clustering and identifying substructures of longitudinal data, J R Stat Soc Ser B (Stat Methodol), 69, 4, 679-699, 2007 · Zbl 07555371
[7] Chiou, JM; Li, PL, Correlation-based functional clustering via subspace projection, J Am Stat Assoc, 103, 484, 1684-1692, 2008 · Zbl 1286.62058
[8] Coffey, N.; Hinde, J.; Holian, E., Clustering longitudinal profiles using P-splines and mixed effects models applied to time-course gene expression data, Comput Stat Data Anal, 71, 14-29, 2014 · Zbl 1471.62045
[9] Daawin, P.; Kim, S.; Miljkovic, T., Predictive modeling of obesity prevalence for the us population, N Am Actuar J, 23, 1, 64-81, 2019 · Zbl 1411.91275
[10] de Amorim RC (2012) Constrained clustering with minkowski weighted k-means. In: 2012 IEEE 13th international symposium on computational intelligence and informatics (CINTI). IEEE, pp. 13-17
[11] De Boor, C., A practical guide to splines, 2001, New York, NY: Springer, New York, NY · Zbl 0987.65015
[12] Fan, J.; Li, R., Variable selection via nonconcave penalized likelihood and its oracle properties, J Am Stat Assoc, 96, 456, 1348-1360, 2001 · Zbl 1073.62547
[13] Fang, K.; Chen, Y.; Ma, S.; Zhang, Q., Biclustering analysis of functionals via penalized fusion, J Multivar Anal, 189, 104874, 2022 · Zbl 1493.62381
[14] Foulds J, Kumar S, Getoor L (2015) Latent topic networks: a versatile probabilistic programming framework for topic models. In International conference on machine learning. PMLR, pp. 777-786
[15] Hales CM, Carroll MD, Fryar CD, Ogden CL (2017) Prevalence of obesity among adults and youth: United states, 2015-2016. NCHS data brief (288)
[16] Huang, H.; Li, Y.; Guan, Y., Joint modeling and clustering paired generalized longitudinal trajectories with application to cocaine abuse treatment data, J Am Stat Assoc, 109, 508, 1412-1424, 2014
[17] Hubert, L.; Arabie, P., Comparing partitions, J Classif, 2, 1, 193-218, 1985
[18] Ibrahim, JG; Zhu, H.; Tang, N., Model selection criteria for missing-data problems using the EM algorithm, J Am Stat Assoc, 103, 484, 1648-1658, 2008 · Zbl 1286.62082
[19] Jacques, J.; Preda, C., Funclust: a curves clustering method using functional random variables density approximation, Neurocomputing, 112, 164-171, 2013
[20] Jacques, J.; Preda, C., Functional data clustering: a survey, Adv Data Anal Classif, 8, 3, 231-255, 2014 · Zbl 1414.62018
[21] Jain, AK, Data clustering: 50 years beyond K-means, Pattern Recogn Lett, 31, 8, 651-666, 2010
[22] James, GM; Hastie, TJ; Sugar, CA, Principal component models for sparse functional data, Biometrika, 87, 3, 587-602, 2000 · Zbl 0962.62056
[23] James, GM; Sugar, CA, Clustering for sparsely sampled functional data, J Am Stat Assoc, 98, 462, 397-408, 2003 · Zbl 1041.62052
[24] Jiang, H.; Serban, N., Clustering random curves under spatial interdependence with application to service accessibility, Technometrics, 54, 2, 108-119, 2012
[25] Li, T.; Song, X.; Zhang, Y.; Zhu, H.; Zhu, Z., Clusterwise functional linear regression models, Comput Stat Data Anal, 158, 107192, 2021 · Zbl 1510.62298
[26] Li, Y.; Hsing, T., Uniform convergence rates for nonparametric regression and principal component analysis in functional/longitudinal data, Ann Stat, 38, 6, 3321-3351, 2010 · Zbl 1204.62067
[27] Li, Y.; Wang, N.; Carroll, RJ, Selecting the number of principal components in functional data, J Am Stat Assoc, 108, 504, 1284-1294, 2013 · Zbl 1288.62102
[28] Luan, Y.; Li, H., Clustering of time-course gene expression data using a mixed-effects model with B-splines, Bioinformatics, 19, 4, 474-482, 2003
[29] Lv, Y.; Zhu, X.; Zhu, Z.; Qu, A., Nonparametric cluster analysis on multiple outcomes of longitudinal data, Stat Sin, 30, 4, 1829-1856, 2020 · Zbl 1466.62356
[30] Ma, H.; Liu, C.; Xu, S.; Yang, J., Subgroup analysis for functional partial linear regression model, Can J Stat, 51, 2, 559-579, 2023 · Zbl 07759544
[31] Ma, S.; Huang, J., A concave pairwise fusion approach to subgroup analysis, J Am Stat Assoc, 112, 517, 410-423, 2017
[32] Ma S, Huang J, Zhang Z, Liu M (2020) Exploration of heterogeneous treatment effects via concave fusion. Int J Biostat 16(1):20180026. doi:10.1515/ijb-2018-0026/html
[33] Miljkovic, T.; Wang, X., Identifying subgroups of age and cohort effects in obesity prevalence, Biom J, 63, 1, 168-186, 2021 · Zbl 1523.62168
[34] Ng, SK; McLachlan, GJ; Wang, K.; Ben-Tovim Jones, L.; Ng, SW, A mixture model with random-effects components for clustering correlated gene-expression profiles, Bioinformatics, 22, 14, 1745-1752, 2006
[35] Peng, J.; Müller, HG, Distance-based clustering of sparsely observed stochastic processes, with applications to online auctions, Ann Appl Stat, 2, 3, 1056-1077, 2008 · Zbl 1149.62053
[36] Ramsay, JO; Silverman, BW, Functional data analysis, 2005, New York: Springer, New York · Zbl 1079.62006
[37] Rand, WM, Objective criteria for the evaluation of clustering methods, J Am Stat Assoc, 66, 336, 846-850, 1971
[38] Redd, A., A comment on the orthogonalization of B-spline basis functions and their derivatives, Stat Comput, 22, 1, 251-257, 2012 · Zbl 1322.62034
[39] Ren, M.; Zhang, S.; Zhang, Q.; Ma, S., Gaussian graphical model-based heterogeneity analysis via penalized fusion, Biometrics, 78, 2, 524-535, 2022 · Zbl 1520.62312
[40] Sangalli, LM; Secchi, P.; Vantini, S.; Vitelli, V., K-mean alignment for curve clustering, Comput Stat Data Anal, 54, 5, 1219-1233, 2010 · Zbl 1464.62153
[41] Sugar, CA; James, GM, Finding the number of clusters in a dataset: an information-theoretic approach, J Am Stat Assoc, 98, 463, 750-763, 2003 · Zbl 1046.62064
[42] Tibshirani, R., Regression shrinkage and selection via the lasso, J R Stat Soc Ser B (Methodol), 58, 1, 267-288, 1996 · Zbl 0850.62538
[43] Vinh, NX; Epps, J.; Bailey, J., Information theoretic measures for clusterings comparison: variants, properties, normalization and correction for chance, J Mach Learn Res, 11, 2837-2854, 2010 · Zbl 1242.62062
[44] Wang, H.; Li, R.; Tsai, CL, Tuning parameter selectors for the smoothly clipped absolute deviation method, Biometrika, 94, 3, 553-568, 2007 · Zbl 1135.62058
[45] Wang, X.; Zhu, Z.; Zhang, HH, Spatial heterogeneity automatic detection and estimation, Comput Stat Data Anal, 180, 107667, 2023 · Zbl 07708621
[46] Xiao, P.; Wang, G., Partial functional linear regression with autoregressive errors, Commun Stat Theory Methods, 51, 13, 4515-4536, 2022 · Zbl 07562246
[47] Yao, F.; Müller, HG; Wang, JL, Functional data analysis for sparse longitudinal data, J Am Stat Assoc, 100, 470, 577-590, 2005 · Zbl 1117.62451
[48] Zhang, CH, Nearly unbiased variable selection under minimax concave penalty, Ann Stat, 38, 2, 894-942, 2010 · Zbl 1183.62120
[49] Zhang, X.; Zhang, Q.; Ma, S.; Fang, K., Subgroup analysis for high-dimensional functional regression, J Multivar Anal, 192, 105100, 2022 · Zbl 1520.62034
[50] Zhou, L.; Huang, JZ; Carroll, RJ, Joint modelling of paired sparse functional data using principal components, Biometrika, 95, 3, 601-619, 2008 · Zbl 1437.62676
[51] Zhou, L.; Sun, S.; Fu, H.; Song, PXK, Subgroup-effects models for the analysis of personal treatment effects, Ann Appl Stat, 16, 1, 80-103, 2022 · Zbl 1498.62278
[52] Zhu, X.; Qu, A., Cluster analysis of longitudinal profiles with subgroups, Electron J Stat, 12, 1, 171-193, 2018 · Zbl 1393.62032
[53] Zhu, X.; Tang, X.; Qu, A., Longitudinal clustering for heterogeneous binary data, Stat Sin, 31, 2, 603-624, 2021 · Zbl 1469.62406
[54] Zhu, Y.; Di, C.; Chen, YQ, Clustering functional data with application to electronic medication adherence monitoring in HIV prevention trials, Stat Biosci, 11, 2, 238-261, 2019
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.