×

Bayesian clustering for continuous-time hidden Markov models. (English. French summary) Zbl 07759523

Summary: We develop clustering procedures for longitudinal trajectories based on a continuous-time hidden Markov model (CTHMM) and a generalized linear observation model. Specifically, in this article we carry out finite and infinite mixture model-based clustering for a CTHMM and achieve inference using Markov chain Monte Carlo (MCMC). For a finite mixture model with a prior on the number of components, we implement reversible-jump MCMC to facilitate the trans-dimensional move between models with different numbers of clusters. For a Dirichlet process mixture model, we utilize restricted Gibbs sampling split-merge proposals to improve the performance of the MCMC algorithm. We apply our proposed algorithms to simulated data as well as a real-data example, and the results demonstrate the desired performance of the new sampler.
{© 2021 Statistical Society of Canada}

MSC:

62-XX Statistics
60J22 Computational methods in Markov chains
62F15 Bayesian inference
62H30 Classification and discrimination; cluster analysis (statistical aspects)

References:

[1] Antoniak, C. E. (1974). Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems. The Annals of Statistics, 2, 1152-1174. · Zbl 0335.60034
[2] Bicego, M., Murino, V., & Figueiredo, M. A.Similarity‐based clustering of sequences using hidden Markov models. In Machine Learning and Data Mining in Pattern Recognition. Perner, P. (ed.) & Rosenfeld, A. (ed.) (eds), 86-95. Springer: Berlin; 2003. · Zbl 1029.68561
[3] Billingsley, P. (1961). Statistical methods in Markov chains. The Annals of Mathematical Statistics, 32, 12-40. · Zbl 0104.12802
[4] Bladt, M. & Sørensen, M. (2005). Statistical inference for discretely observed Markov jump processes. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67, 395-410. · Zbl 1069.62061
[5] Blais, C., Jean, S., Sirois, C., Rochette, L., Plante, C., Larocque, I., & Doucet, M. (2014). Quebec integrated chronic disease surveillance system (QICDSS), an innovative approach. Chronic Diseases and Injuries in Canada, 34, 226-235.
[6] Brooks, S. P., Giudici, P., & Roberts, G. O. (2003). Efficient construction of reversible jump Markov chain Monte Carlo proposal distributions (with discussion). Journal of the Royal Statistical Society: Series B (Statistical Methodology), 65, 3-55. · Zbl 1063.62120
[7] Carlin, B. P. & Chib, S. (1995). Bayesian model choice via Markov chain Monte Carlo methods. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 57, 473-484. · Zbl 0827.62027
[8] Celeux, G., Hurn, M., & Robert, C. P. (2000). Computational and inferential difficulties with mixture posterior distributions. Journal of the American Statistical Association, 95, 957-970. · Zbl 0999.62020
[9] Crayen, C., Eid, M., Lischetzke, T., Courvoisier, D. S., & Vermunt, J. K. (2012). Exploring dynamics in mood regulation‐mixture latent Markov modeling of ambulatory assessment data. Psychosomatic Medicine, 74, 366-376.
[10] Dellaportas, P. & Papageorgiou, I. (2006). Multivariate mixtures of normals with unknown number of components. Statistics and Computing, 16, 57-68.
[11] Escobar, M. D. & West, M. (1995). Bayesian density estimation and inference using mixtures. Journal of the American Statistical Association, 90, 577-588. · Zbl 0826.62021
[12] Ferguson, T. S. (1973). A Bayesian analysis of some nonparametric problems. The Annals of Statistics, 1, 209-230. · Zbl 0255.62037
[13] Fraley, C. & Raftery, A. E. (1998). How many clusters? Which clustering method? Answers via model‐based cluster analysis. The Computer Journal, 41, 578-588. · Zbl 0920.68038
[14] Godsill, S. J. (2001). On the relationship between Markov chain Monte Carlo methods for model uncertainty. Journal of Computational and Graphical Statistics, 10, 230-248.
[15] GOLD Executive Committee. 2017. Pocket guide to COPD diagnosis, management and prevention: A guide for health care professionals, 2017 report. http://goldcopd.org/wp‐content/uploads/2016/12/wms‐GOLD‐2017‐Pocket‐Guide.pdf
[16] Green, P. J. (1995). Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika, 82, 711-732. · Zbl 0861.62023
[17] Green, P. J. & Richardson, S. (2001). Modelling heterogeneity with and without the Dirichlet process. Scandinavian Journal of Statistics, 28, 355-375. · Zbl 0973.62031
[18] Hobolth, A. & Stone, E. A. (2009). Simulation from endpoint‐conditioned, continuous‐time Markov chains on a finite state space, with applications to molecular evolution. The Annals of Applied Statistics, 3, 1204-1231. · Zbl 1196.62139
[19] Ishwaran, H. & James, L. F. (2001). Gibbs sampling methods for stick‐breaking priors. Journal of the American Statistical Association, 96, 161-173. · Zbl 1014.62006
[20] Jackson, C. H., Sharples, L. D., Thompson, S. G., Duffy, S. W., & Couto, E. (2003). Multistate Markov models for disease progression with classification error. Journal of the Royal Statistical Society: Series D (The Statistician), 52, 193-209.
[21] Jain, S. & Neal, R. M. (2004). Asplit‐merge Markov chain Monte Carlo procedure for the Dirichlet process mixture model. Journal of Computational and Graphical Statistics, 13, 158-182.
[22] Jain, S. & Neal, R. M. (2007). Splitting and merging components of a non‐conjugate Dirichlet process mixture model. Bayesian Analysis, 2, 445-472. · Zbl 1331.62145
[23] Jebara, T., Song, Y., & Thadani, K.Spectral clustering and embedding with hidden Markov models. In European Conference on Machine Learning, 164-175. Springer: Berlin; 2007.
[24] Lange, J. M., Hubbard, R. A., Inoue, L. Y., & Minin, V. N. (2015). A joint model for multistate disease processes and random informative observation times, with applications to electronic medical records data. Biometrics, 71, 90-101. · Zbl 1419.62384
[25] Lix, L., Puchtinger, R., Reimer, K., Robitaille, C., Smith, M., Svenson, L., Tu, K., VanTil, L., Waits, S., Pelletier, L., Phillips, K., Pelletier, C., Paterson, J. M., Ayles, J., Bartholomew, S., Cooke, C., Ellison, J., Emond, V., Hamm, N., Hannah, H., Jean, S., LeBlanc, S., & OŠDonnel, S. (2018). The Canadian chronic disease surveillance system: A model for collaborative surveillance. International Journal of Population Data Science, 3. doi:10.23889/ijpds.v3i3.433.
[26] Lo, A. Y. (1984). On a class of Bayesian nonparametric estimates: I. Density estimates. The Annals of Statistics, 12, 351-357. · Zbl 0557.62036
[27] Luo, Y.2019. Latent multi‐state models for non‐equidistant longitudinal observations with finite and infinite mixture model‐based clustering (PhD thesis). McGill University.
[28] Luo, Y. & Stephens, D. A. (2021). Bayesian inference for continuous‐time hidden Markov models with an unknown number of states. Statistics and Computing, 31, 57. · Zbl 1475.62047
[29] Luo, Y., Stephens, D. A., Verma, A., & Buckeridge, D. L. (2021). Bayesian latent multi‐state modeling for non‐equidistant longitudinal electronic health records. Biometrics, 77, 78-90. · Zbl 1520.62279
[30] MacEachern, S. N. & Müller, P. (1998). Estimating mixtures of Dirichlet process models. Journal of Computational and Graphical Statistics, 7, 223-238.
[31] Maruotti, A. & Rydén, T. (2009). A semiparametric approach to hidden Markov models under longitudinal observations. Statistics and Computing, 19, 381-393.
[32] Miller, J. W. & Harrison, M. T. (2018). Mixture models with a prior on the number of components. Journal of the American Statistical Association, 113, 340-356. · Zbl 1398.62066
[33] Neal, R. M. (2000). Markov chain sampling methods for Dirichlet process mixture models. Journal of Computational and Graphical Statistics, 9, 249-265.
[34] Panuccio, A., Bicego, M., & Murino, V.A hidden Markov model‐based approach to sequential data clustering. In Structural, Syntactic, and Statistical Pattern Recognition. Caelli, T (ed.), Amin, A. (ed.), Duin, R. P. W. (ed.), deRidder, D. (ed.), & Kamel, M. (ed.) (eds), 734-743. Springer: Berlin; 2002. · Zbl 1073.68784
[35] Richardson, S. & Green, P. J. (1997). On Bayesian analysis of mixtures with an unknown number of components (with discussion). Journal of the Royal Statistical Society: Series B (Statistical Methodology), 59, 731-792. · Zbl 0891.62020
[36] Smyth, P. (1997). Clustering sequences with hidden Markov models. Advances in Neural Information Processing Systems, 648-654.
[37] Williams, J. P., Storlie, C. B., Therneau, T. M., Jack, C. R., Jr., & Hannig, J. (2020). A Bayesian approach to multistate hidden Markov models: Application to dementia progression. Journal of the American Statistical Association, 115, 16-31. · Zbl 1437.62656
[38] Zucchini, W., MacDonald, I. L., & Langrock, R. (2017). Hidden Markov Models for Time Series: An Introduction Using R. CRC Press, New York.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.