×

Mixture of multivariate \(t\) nonlinear mixed models for multiple longitudinal data with heterogeneity and missing values. (English) Zbl 1420.62290

Summary: The multivariate \(t\) nonlinear mixed-effects model (MtNLMM) has been shown to be effective for analyzing multi-outcome longitudinal data following nonlinear growth patterns with fat-tailed noises or potential outliers. This paper considers the problem of clustering heterogeneous longitudinal profiles in a mixture framework of MtNLMM. A finite mixture of multivariate \(t\) nonlinear mixed model is proposed, and this new model allows accommodating more complex features of longitudinal data. Intermittent missing values frequently occur in the data collection process of multiple repeated measures. Under a missing at random mechanism, a pseudo-data version of the alternating expectation-conditional maximization algorithm is developed to carry out maximum likelihood estimation and impute missing values simultaneously. The techniques for clustering of incomplete multiple trajectories, recovery of missing responses, and allocation of future subjects are also investigated. The practical utility is demonstrated through a real data example coming from a study of 124 normal and 37 abnormal pregnant women. Simulation studies are provided to validate the proposed approach.

MSC:

62J02 General nonlinear regression
62H30 Classification and discrimination; cluster analysis (statistical aspects)
Full Text: DOI

References:

[1] Akaike H (1973) Information theory and an extension of the maximum likelihood principle. In: Petrov BN, Csaki F (eds) Proceedings of the 2nd international symposium on information theory, Akademiai, Kiado, Budapest, pp 267-281 · Zbl 0283.62006
[2] Anderson TW (2003) An introduction to multivariate statistical analysis, 3rd edn. Wiley, New York · Zbl 1039.62044
[3] Azzalini A, Capitaino A (2003) Distributions generated by perturbation of symmetry with emphasis on a multivariate skew \[t\] t-distribution. J R Stat Soc Ser B 65:367-389 · Zbl 1065.62094 · doi:10.1111/1467-9868.00391
[4] Bai X, Chen K, Yao W (2016) Mixture of linear mixed models using multivariate \[t\] t distribution. J Stat Comput Simul 86:771-787 · Zbl 1510.62272 · doi:10.1080/00949655.2015.1036431
[5] Becker C, Gather U (1999) The masking breakdown point of multivariate outlier identification rules. J Am Stat Assoc 94(447):947-955 · Zbl 1072.62600 · doi:10.1080/01621459.1999.10474199
[6] Booth JG, Casella G, Hobert JP (2008) Clustering using objective functions and stochastic search. J R Stat Soc B 70:119-139 · Zbl 1400.62128 · doi:10.1111/j.1467-9868.2007.00629.x
[7] Celeux G, Martin O, Lavergne C (2005) Mixture of linear mixed models for clustering gene expression profiles from repeated microarray experiments. Stat Model 5:243-267 · Zbl 1111.62103 · doi:10.1191/1471082X05st096oa
[8] De la Cruz-Mesía R, Quintana FA, Marshall G (2008) Model-based clustering for longitudinal data. Comput Stat Data Anal 52:1441-1457 · Zbl 1452.62454 · doi:10.1016/j.csda.2007.04.005
[9] Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood estimation from incomplete data via the EM algorithm (with discussion). J R Stat Soc Ser B 39:1-38 · Zbl 0364.62022
[10] Fawcett T (2006) An introduction to ROC analysis. Patt Recog Lett 27:861-874 · doi:10.1016/j.patrec.2005.10.010
[11] Filzmoser P, Garrett RG, Reimann C (2005) Multivariate outlier detection in exploration geochemistry. Comput Geosci 31:579-587 · doi:10.1016/j.cageo.2004.11.013
[12] Gaffney SJ, Smyth P (2003) Curve clustering with random effects regression mixtures. In: Bishop CM, Frey BJ (eds) Proceedings of the 9th international workshop on artificial intelligence and statistics, Key West
[13] Goldfeld SM, Quandt RE (1973) A Markov model for switching regression. J Econom 1:3-15 · Zbl 0294.62087 · doi:10.1016/0304-4076(73)90002-X
[14] Grün, B.; Leisch, F., Finite mixtures of generalized linear regression models, 205-230 (2008), Heidelberg · doi:10.1007/978-3-7908-2064-5_11
[15] Hartigan JA, Wong MA (1979) Algorithm AS 136: a K-means clustering algorithm. Appl Stat 28(1):100-108 · Zbl 0447.62062 · doi:10.2307/2346830
[16] Hastie T, Tibshirani R, Friedman JH (2001) Elements of statistical learning: data mining, inference, and prediction. Springer, New York · Zbl 0973.62007 · doi:10.1007/978-0-387-21606-5
[17] Hennig C (2000) Identifiablity of models for clusterwise linear regression. J Classif 17(2):273-296 · Zbl 1017.62058 · doi:10.1007/s003570000022
[18] Ho HJ, Lin TI (2010) Robust linear mixed models using the skew \[t\] t distribution with application to schizophrenia data. Biom J 52:449-469 · Zbl 1197.62055 · doi:10.1002/bimj.200900184
[19] Hubert L, Arabie P (1985) Comparing partitions. J Classif 2(1):193-218 · Zbl 0587.62128 · doi:10.1007/BF01908075
[20] Hughes JP (1999) Mixed-effects models with censored data with application to HIV RNA levels. Biometrics 55:625-629 · Zbl 1059.62661 · doi:10.1111/j.0006-341X.1999.00625.x
[21] Ibrahim J, Molenberghs G (2009) Missing data methods in longitudinal studies: a review. TEST 18:1-43 · Zbl 1203.62193 · doi:10.1007/s11749-009-0138-x
[22] Ingrassia S, Minotti SC, Punzo A (2014) Model-based clustering via linear cluster-weighted models. Comput Statist Data Anal 71:159-182 · Zbl 1471.62095 · doi:10.1016/j.csda.2013.02.012
[23] Kotz S, Nadarajah S (2004) Multivariate \[t\] t distributions and their applications. Cambridge University Press, Cambridge · Zbl 1100.62059 · doi:10.1017/CBO9780511550683
[24] Lachos VH, Bandyopadhyay D, Dey DK (2011) Linear and nonlinear mixed-effects models for censored HIV viral loads using normal/independent distributions. Biometrics 67:1594-1604 · Zbl 1274.62806 · doi:10.1111/j.1541-0420.2011.01586.x
[25] Laird NM, Ware JH (1982) Random effects models for longitudinal data. Biometrics 38:963-974 · Zbl 0512.62107 · doi:10.2307/2529876
[26] Lin TI, Lee JC (2008) Estimation and prediction in linear mixed models with skew normal random effects for longitudinal data. Stat Med 27:1490-1507 · doi:10.1002/sim.3026
[27] Lin TI, Wang WL (2013) Multivariate skew-normal linear mixed models for multi-outcome longitudinal data. Stat Model 13:199-221 · Zbl 07257455 · doi:10.1177/1471082X13480283
[28] Lin TI, Wang WL (2017) Multivariate-\[t\] t nonlinear mixed models with application to censored multi-outcome AIDS studies. Biostatistics 18(4):666-681
[29] Lin TI, McLachlanc GJ, Lee SX (2016) Extending mixtures of factor models using the restricted multivariate skew-normal distribution. J Multivar Anal 143:398-413 · Zbl 1328.62378 · doi:10.1016/j.jmva.2015.09.025
[30] Lin TI, Lachos VH, Wang WL (2018) Multivariate longitudinal data analysis with censored and intermittent missing responses. Stat Med 37:2822-2835 · doi:10.1002/sim.7692
[31] Lindstrom MJ, Bates DM (1990) Nonlinear mixed effects models for repeated measures data. Biometrics 46:673-687 · doi:10.2307/2532087
[32] Little RJA (1995) Modeling the drop-out mechanism in repeated-measures studies. J Am Stat Assoc 90:1113-1121 · Zbl 0841.62099
[33] Lo K, Gottardo R (2012) Flexible mixture modeling via the multivariate \[t\] t distribution with the Box-Cox transformation: an alternative to the skew-\[t\] t distribution. Stat Comput 22(1):33-52 · Zbl 1322.62173 · doi:10.1007/s11222-010-9204-1
[34] Marinoa MF, Alfó M (2016) Gaussian quadrature approximations in mixed hidden Markov models for longitudinal data: a simulation study. Comput Stat Data Anal 94:193-209 · Zbl 1468.62134 · doi:10.1016/j.csda.2015.07.016
[35] Marshall G, De la Cruz-Mesia R, Baron AE, Rutledge JH, Zerbe GO (2006) Non-linear random effects model for multivariate responses with missing data. Stat Med 25:2817-2830 · doi:10.1002/sim.2361
[36] Marshall G, De la Cruz-Mesia R, Quintana FA, Baron AE (2009) Discriminant analysis for longitudinal data with multiple continuous responses and possibly missing data. Biometrics 65:69-80 · Zbl 1159.62337 · doi:10.1111/j.1541-0420.2008.01016.x
[37] Maruotti A (2011) Mixed hidden Markov models for longitudinal data: an overview. Int Stat Rev 79(3):427-454 · Zbl 1238.62094 · doi:10.1111/j.1751-5823.2011.00160.x
[38] Maruotti A (2015) Handling non-ignorable dropouts in longitudinal data: a conditional model based on a latent Markov heterogeneity structure. TEST 24:84-109 · Zbl 1315.62065 · doi:10.1007/s11749-014-0397-z
[39] Maruotti A, Punzo A (2017) Model-based time-varying clustering of multivariate longitudinal data with covariates and outliers. Comput Stat Data Anal 113:475-496 · Zbl 1464.62128 · doi:10.1016/j.csda.2016.05.024
[40] McLachlan GJ, Peel D (2000) Finite mixture models. Wiley, New York · Zbl 0963.62061 · doi:10.1002/0471721182
[41] McNicholas PD, Murphy TB (2010) Model-based clustering of longitudinal data. Can J Stat 38(1):153-168 · Zbl 1190.62120
[42] Meng XL, van Dyk D (1997) The EM algorithm—an old folk-song sung to a fast new tune. J R Stat Soc Ser B 59:511-567 · Zbl 1090.62518 · doi:10.1111/1467-9868.00082
[43] Muñoz A, Carey V, Schouten JP, Segal M, Rosner B (1992) A parametric family of correlation structures for the analysis of longitudinal data. Biometrics 48:733-42 · doi:10.2307/2532340
[44] Ng SK, McLachlan GJ, Wang K, Ben-Tovim L, Ng SW (2006) A mixture model with random-effects components for clustering correlated gene-expression profiles. Bioinformatics 22:1745-1752 · doi:10.1093/bioinformatics/btl165
[45] Peel D, McLachlan GJ (2000) Robust mixture modelling using the \[t\] t distribution. Stat Comput 10:339-348 · doi:10.1023/A:1008981510081
[46] Pfeifer C (2004) Classification of longitudinal profiles based on semi-parametric regression with mixed effects. Stat Med 4:314-323 · Zbl 1061.62200 · doi:10.1191/1471082X04st082oa
[47] Pinheiro J, Bates D, Debroy S, Sarkar D, R Core Team (2016) nlme: linear and nonlinear mixed effects models. R package version 3.1-128. http://CRAN.R-project.org/package=nlme. Accessed 8 Sept 2016
[48] Punzo A, McNicholas PD (2017) Robust clustering in regression analysis via the contaminated Gaussian cluster-weighted model. J Classif 34(2):249-293 · Zbl 1373.62316 · doi:10.1007/s00357-017-9234-x
[49] Robin X, Turck N, Hainard A, Tiberti N, Lisacek F, Sanchez JC, Muller M (2011) pROC: an open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinform 12:77 · doi:10.1186/1471-2105-12-77
[50] Rousseeuw PJ, Van Zomeren BC (1990) Unmasking multivariate outliers and leverage points. J Am Stat Assoc 85(411):633-651 · doi:10.1080/01621459.1990.10474920
[51] Roy A (2006) Estimating correlation coefficient between two variables with repeated observations using mixed effects model. Biom J 48:286-301 · Zbl 1442.62594 · doi:10.1002/bimj.200510192
[52] Roy J, Lin X (2002) Analysis of multivariate longitudinal outcomes with nonignorable dropouts and missing covariates: changes in methadone treatment practices. J Am Stat Assoc 97:40-52 · Zbl 1073.62587 · doi:10.1198/016214502753479211
[53] Rubin DB (1976) Inference and missing data. Biometrika 63:581-592 · Zbl 0344.62034 · doi:10.1093/biomet/63.3.581
[54] Rubin DB (1987) Multiple imputation for nonresponse in surveys. Wiley, New York · Zbl 1070.62007 · doi:10.1002/9780470316696
[55] Sahu SK, Dey DK, Branco MD (2003) A new class of multivariate skew distributions with applications to Bayesian regression models. Can J Stat 31:129-150 · Zbl 1039.62047 · doi:10.2307/3316064
[56] Schroeter P, Vesin JM, Langenberger T, Meuli R (1998) Robust parameter estimation of intensity distributions for brain magnetic resonance images. IEEE Trans Med Imaging 17(2):172-186 · doi:10.1109/42.700730
[57] Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6:461-464 · Zbl 0379.62005 · doi:10.1214/aos/1176344136
[58] Shah A, Laird N, Schoenfeld D (1997) A random-effects model for multiple characteristics with possibly missing data. J Amer Statist Assoc 92:775-779 · Zbl 0888.62113 · doi:10.1080/01621459.1997.10474030
[59] Spiessens B, Verbeke G, Komárek A (2002) A SAS-macro for the classification of longitudinal profiles using mixtures of normal distributions in nonlinear and generalised linear mixed models. Technical Report, Biostatistical Center, Catholic Univ., Leuven
[60] Stephens M (2000) Dealing with label switching in mixture models. J R Stat Soc Ser B 62:795-809 · Zbl 0957.62020 · doi:10.1111/1467-9868.00265
[61] Vaida F, Liu L (2009) Fast implementation for normal mixed effects models with censored response. J Comput Graph Stat 18:797-817 · doi:10.1198/jcgs.2009.07130
[62] Verbeke G, Lesaffre E (1996) A linear mixed-effects model with heterogeneity in the random-effects population. J Am Stat Assoc 91:217-221 · Zbl 0870.62057 · doi:10.1080/01621459.1996.10476679
[63] Wang WL (2013) Multivariate \[t\] t linear mixed models for irregularly observed multiple repeated measures with missing outcomes. Biom J 55:554-571 · Zbl 1441.62525 · doi:10.1002/bimj.201200001
[64] Wang WL (2017) Mixture of multivariate-\[t\] t linear mixed models for multi-outcome longitudinal data with heterogeneity. Stat Sin 27:733-760 · Zbl 1391.62124
[65] Wang WL, Fan TH (2010) ECM-based maximum likelihood inference for multivariate linear mixed models with autoregressive errors. Comput Stat Data Anal 54:1328-1341 · Zbl 1464.62179 · doi:10.1016/j.csda.2009.11.021
[66] Wang WL, Fan TH (2011) Estimation in multivariate \[t\] t linear mixed models for multiple longitudinal data. Stat Sin 21:1857-1880 · Zbl 1225.62130
[67] Wang WL, Lin TI (2014) Multivariate \[t\] t nonlinear mixed-effects models for multi-outcome longitudinal data with missing values. Stat Med 33:3029-3046 · doi:10.1002/sim.6144
[68] Wang WL, Lin TI (2015) Bayesian analysis of multivariate \[t\] t linear mixed models with missing responses at random. J Stat Computat Simul 85:3594-3612 · Zbl 1510.62305 · doi:10.1080/00949655.2014.989852
[69] Wang WL, Lin TI, Lachos VH (2018) Extending multivariate-\[t\] t linear mixed models for multiple longitudinal data with censored responses and heavy tails. Stat Methods Med Res 27(1):48-64 · doi:10.1177/0962280215620229
[70] Wolfinger RD, Lin X (1997) Two Taylor-series approximation methods for nonlinear mixed models. Comput Stat Data Anal 25:465-490 · Zbl 0900.65409 · doi:10.1016/S0167-9473(97)00012-1
[71] Yamashita T, Okamoto S, Thomas A, MacLachlan V, Healy DL (1989) Predicting pregnancy outcome after in vitro fertilization and embryo transfer using estradiol, progesterone and human chorionic gonadotrophin \[\beta\] β-subunit. Ferti Ster 51:304-309 · doi:10.1016/S0015-0282(16)60495-8
[72] Yao W, Wei Y, Yu C (2014) Robust mixture regression using the \[t\] t-distribution. Comput Stat Data Anal 71:116-127 · Zbl 1471.62227 · doi:10.1016/j.csda.2013.07.019
[73] Yu C, Chen K, Yao W (2015) Outlier detection and robust mixture modeling using nonconvex penalized likelihood. J Stat Plann Inference 164:27-38 · Zbl 1322.62180 · doi:10.1016/j.jspi.2015.03.003
[74] Yu C, Yao W, Chen K (2017) A new method for robust mixture regression. Can J Stat 45(1):77-94 · Zbl 1462.62198 · doi:10.1002/cjs.11310
[75] Zucchini W, MacDonald IL, Langrock R (2016) Hidden Markov models for time series: an introduction using R, 2nd edn. Chapman and Hall, Boca Raton · Zbl 1362.62005
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.