×

Multilevel latent class models with Dirichlet mixing distribution. (English) Zbl 1216.62164

Summary: Latent class analysis (LCA) and latent class regression (LCR) are widely used for modeling multivariate categorical outcomes in social science and biomedical studies. Standard analyses assume data of different respondents to be mutually independent, excluding application of the methods to familial and other designs in which participants are clustered. We consider multilevel latent class models, in which subpopulation mixing probabilities are treated as random effects that vary among clusters according to a common Dirichlet distribution. We apply the expectation-maximization (EM) algorithm for model fitting by maximum likelihood (ML). This approach works well, but is computationally intensive when either the number of classes or the cluster size is large. We propose a maximum pairwise likelihood (MPL) approach via a modified EM algorithm for this case. We also show that a simple latent class analysis, combined with robust standard errors, provides another consistent, robust, but less-efficient inferential procedure. Simulation studies suggest that the three methods work well in finite samples, and that the MPL estimates often enjoy comparable precision as the ML estimates. We apply our methods to the analysis of comorbid symptoms in the obsessive compulsive disorder study. Our models’ random effects structure has more straightforward interpretation than those of competing methods, thus should usefully augment tools available for LCA of multilevel data.

MSC:

62P10 Applications of statistics to biology and medical sciences; meta analysis
62N02 Estimation in survival analysis and censored data
65C60 Computational problems in statistics (MSC2010)

References:

[1] Bandeen-Roche, Latent variable regression for multiple discrete outcomes, Journal of the American Statistical Association 92 pp 1375– (1997) · Zbl 0912.62121 · doi:10.2307/2965407
[2] Clogg, Latent class models, Handbook of Statistical Modeling for the Social and Behavioral Sciences pp 311– (1995) · doi:10.1007/978-1-4899-1292-3_6
[3] Cox, A note on pseudolikelihood constructed from marginal densities, Biometrika 91 (2004) · Zbl 1162.62365 · doi:10.1093/biomet/91.3.729
[4] Dempster, Maximum likelihood from incomplete observations, Journal of the Royal Statistical Society, Series B 39 pp 1– (1977)
[5] Jenike, Obsessive Compulsive Disorders: Theory and Management (1990)
[6] Kuk, A pairwise likelihood approach to analyzing correlated binary data, Statistics and Probability Letters 47 pp 329– (2000) · Zbl 0973.62056 · doi:10.1016/S0167-7152(99)00174-1
[7] Liang, Longitudinal data analysis using generalized linear models, Biometrika 73 (1986) · Zbl 0595.62110 · doi:10.1093/biomet/73.1.13
[8] Lindsay, Composite likelihood methods, Contemporary Mathematics 80 pp 221– (1988) · Zbl 0672.62069 · doi:10.1090/conm/080/999014
[9] Louis, Finding the observed information matrix when using the EM algorithm, Journal of the Royal Statistical Society, Series B (Methodological) 44 pp 226– (1982) · Zbl 0488.62018
[10] Nestadt, The identification of OCD-related subgroups based on comorbidity, Biological Psychiatry 53 pp 914– (2003) · doi:10.1016/S0006-3223(02)01677-3
[11] Nott, Pairwise likelihood methods for inference in image models, Biometrika 86 (1999) · Zbl 0938.62108 · doi:10.1093/biomet/86.3.661
[12] Oakes, Direct calculation of the information matrix via the EM algorithm, Journal of the Royal Statistical Society, Series B (Statistical Methodology) 61 pp 479– (1999) · Zbl 0913.62036 · doi:10.1111/1467-9868.00188
[13] Renard, A pairwise likelihood approach to estimation in multilevel probit models, Computational Statistics and Data Analysis 44 pp 649– (2004) · Zbl 1429.62324 · doi:10.1016/S0167-9473(02)00263-3
[14] Royall, Model robust confidence intervals using maximum likelihood estimators, International Statistical Review/Revue Internationale de Statistique 54 pp 221– (1986) · Zbl 0596.62032 · doi:10.2307/1403146
[15] Varin, Pairwise likelihood inference in spatial generalized linear mixed models, Computational Statistics and Data Analysis 49 pp 1173– (2005) · Zbl 1429.62330 · doi:10.1016/j.csda.2004.07.021
[16] Vermunt, Multilevel latent class models, Sociological Methodology 33 pp 213– (2003) · Zbl 1429.62268 · doi:10.1111/j.0081-1750.2003.t01-1-00131.x
[17] Vermunt, Mixed-effects logistic regression models for indirectly observed discrete outcome variables, Multivariate Behavioral Research 40 pp 281– (2005) · doi:10.1207/s15327906mbr4003_1
[18] Vermunt, Latent class and finite mixture models for multilevel data sets, Statistical Methods in Medical Research 17 (2008) · Zbl 1154.62086 · doi:10.1177/0962280207081238
[19] Wong, Generalized Dirichlet distribution in Bayesian analysis, Applied Mathematics and Computation 97 pp 165– (1998) · Zbl 0945.62036 · doi:10.1016/S0096-3003(97)10140-0
[20] Yang, Evaluating latent class analysis models in qualitative phenotype identification, Computational Statistics and Data Analysis 50 pp 1090– (2006) · Zbl 1431.62516 · doi:10.1016/j.csda.2004.11.004
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.