Abstract
Motivated by the increasing use of discrete-state Markov processes across applied disciplines, a Metropolis–Hastings sampling algorithm is proposed for a partially observed process. Current approaches, both classical and Bayesian, have relied on imputing the missing parts of the process and working with a complete likelihood. However, from a Bayesian perspective, the use of latent variables is not necessary and exploiting the observed likelihood function, combined with a suitable Markov chain Monte Carlo method, results in an accurate and efficient approach. A comprehensive comparison with simulated and real data sets demonstrate our approach when compared with alternatives available in the literature.
Similar content being viewed by others
References
Al-Mohy AH, Higham NJ (2011) Computing the action of the matrix exponential, with an application to exponential integrators. SIAM J Sci Comput 33(2):488–511
Amoros R, King R, Toyoda H, Kumada T, Johnson PJ, Bird TG (2019) A continuous-time hidden Markov model for cancer surveillance using serum biomarkers with application to hepatocellular carcinoma. Metron 77(2):67–86
Bladt M, Sorensen M (2005) Statistical inference for discretely observed Markov jump processes. J Roy Stat Soc B 67:395–410
dos Reis G, Smith G (2018) Robust and consistent estimation of generators in credit risk. Quant Financ 18:983–1001
Fearnhead P, Sherlock C (2006) An exact Gibbs sampler for the Markov-modulated Poisson process. J R Stat Soc: Ser B (Stat Methodol) 68:767–784
Fukaya K, Royle JA (2013) Markov models for community dynamics allowing for observation error. Ecology 94:2670–2677
Gelman A, Rubin DB (1992) Inference from iterative simulation using multiple sequences. Stat Sci 7(4):457–472
Georgoulas A, Hillston J, Sanguinetti G (2017) Unbiased Bayesian inference for population Markov jump processes via random truncations. Stat Comput 27(4):991–1002
Gillespie DT (1977) Exact stochastic simulation of coupled chemical reactions. J Phys Chem 81(25):2340–2361
Goulet V, Dutang C, Maechler M, Firth D, Shapira M, Stadelmann M (2021) Package ‘expm’
Grimmett GR, Stirzaker DR (1982) Probability and Random Processes. Oxford University Press
Higham NJ (2005) The scaling and squaring method for the matrix exponential revisited. SIAM J Matrix Anal Appl 26:1179–1193
Inamura Y (2006) Estimating continuous time transition matrices from discretely observed data. Bank of Japan, No.06-E07
Israel RB, Rosenthal JS, Wei JS (2001) Finding generators for Markov chains via empirical transition matrices, with applications to credit ratings. Math Financ 11:245–265
Norris JR (1998) Markov Chains. Cambridge University Press
Pardoux E (2008) Markov processes and applications. Algorithms, networks, genome and finance. Wiley
Pfeuffer M, Möstel L, Fischer M (2019) An extended likelihood framework for modelling discretely observed credit rating transitions. Quant Financ 19:93–104
Pfeuffepdr M (2017) ctmcd: An R Package for Estimating the Parameters of a Continuous-Time Markov Chain from Discrete-Time Data. R J 19:127–141
Plummer M (2003) JAGS: A program for analysis of Bayesian graphical models using Gibbs sampling. In: Proceedings of the 3rd international workshop on distributed statistical computing, vol 124, No. 125.10, pp 1–10
Rao V, Teh YW (2013) Fast MCMC sampling for Markov jump processes and extensions. J Mach Learn Res 14(1):3295–3320
Sherlock C, Fearnhead P, Roberts GO (2010) The random walk Metropolis: linking theory and practice through a case study. Stat Sci 25(2):172–190
Sauer M, Stannat W (2016) Reliability of signal transmission in stochastic nerve axon equations. J Comput Neurosci 40:103–111
Van Kampen NG (2007) Stochastic processes in physics and chemistry. North–Holland
Virtanen P, Gommers R, Oliphant TE, Haberland M, Reddy T, Cournapeau D, van Mulbregt P (2020) SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat Methods 17(3):261-272
Zhao T, Wang Z, Cumberworth A, Gsponer J, de Freitas N, Bouchard-Côté A (2016) Bayesian analysis of continuous time Markov chains with application to phylogenetic modelling. Bayesian Anal 11(4):1203–1237
Acknowledgements
The authors are grateful for the support CONTEX project 2018-9 and PAPIIT-UNAM project IG100221. The authors are also grateful for the comments and suggestions of a referee which has greatly improved the paper.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
The authors gratefully acknowledge the support of project CONTEX 2018-9B and PAPIIT-UNAM IG100221.
Appendices
Appendix A: Metropolis-Hastings algorithm (5) details
The Metropolis–Hastings moves used in our work are the following:
-
For \(\lambda \in \Lambda = \left\{ \lambda _1,\ldots ,\lambda _m\right\} \) let \(g\,:\, {\mathbb {R}}\rightarrow {\mathbb {R}}^+\) be given by \(g(x)=e^x\) and propose moves from \(\lambda \) to \({\tilde{\lambda }}\) by
$$\begin{aligned} {\tilde{\lambda }}= g( g^{-1}(\lambda ) +Z ) \end{aligned}$$with \(Z\sim \text {Norma}(0,1)\) so \({\tilde{\lambda }}\) has a LogNormal distribution. The corresponding transition kernel is given by \(q({\tilde{\lambda }}\,|\,\lambda )\propto e^{-0.5(\log (\lambda )-\log ({\tilde{\lambda }}))^2}{\tilde{\lambda }}^{-1}\).
-
For transition matrix P with no zeros in off-diagonal entries and
\(\pmb {p}\in \left\{ \left( p_{j,1},\ldots ,p_{j,j-1},p_{j,j+1},\ldots ,p_{j,m} \right) \, : \, 1 \le j \le m \right\} \), let \(c>0\) and propose
$$\begin{aligned} \tilde{\pmb {p}}|\pmb {p}\sim \text {Dirichlet}(\pmb {1}+c\pmb {p}) \end{aligned}$$with \(\pmb {1}=(1,\ldots ,1)\) so \(\tilde{\pmb {p}}\) has mode \(\pmb {p}\).
-
For transition matrix P with zeros in off-diagonal entries and
\(p\in \left\{ p_{j,k} \, : \, 1 \le j\le m,\,1 \le k\le m,\, j\ne k\right\} \), let \(g\,:\, {\mathbb {R}}\rightarrow {\mathbb {R}}^+\) be a standard logistic function \(g(x)=e^x/(1+e^x)\) and propose moves from p to \({\tilde{p}}\) by
$$\begin{aligned} {\tilde{p}}= g( g^{-1}(p) +Z ) \end{aligned}$$with \(Z\sim \text {Normal}(0,1)\) so \({\tilde{p}}\) has a LogitNormal distribution. Denote \(\pmb {w}\) as the vector \(\pmb {p}\) with entry value p changed to \({\tilde{p}}\). Finally we do a normalization
$$\begin{aligned} \tilde{\pmb {p}} = \pmb {w}/\sum _{k=1}^m w. \end{aligned}$$The transformation \( f(x_1,\ldots ,x_m) = \left( \frac{x_1}{\sum _{i=1}^m x_i} , \ldots ,\frac{x_{m-1}}{\sum _{i=1}^m x_i},\sum _{i=1}^m x_i \right) \) is known to have Jacobian \(\left( \sum _{i=1}^m x_i \right) ^{-m+1}\). So the corresponding transition kernel is given by \(q(\tilde{\pmb {p}}\,|\,\pmb {p})\propto \exp \left( -0.5(\text {logit}(p_k)-\text {logit}({\tilde{p}}))^2\right) \frac{ \left( {\tilde{p}} + \sum _{i\ne k}p_i \right) ^{m-1}}{{\tilde{p}}(1-{\tilde{p}})}\). Note that we do not marginalize the random variable \(S=\sum _{i=1}^m w_i\) so we have to draw simulations of the auxiliary variable \({\tilde{p}}\) to evaluate the above conditional density.
Appendix B: Second simulation study details
For a generator matrix
with \(\lambda _1,\lambda _2>0\), we have explicitly that
For discrete observation over a time mesh \(\Delta k\), \(\Delta \in {\mathbb {R}}^+\), and \(k \in \{1,\ldots , n\}\) for some \(n\in {\mathbb {N}}\) of the Markov chain associated G; let \(n^{eq}_i\) be the number of times the state remains equal for transitions starting in state i and \(n^{ch}_i\) the number of times the state changes for transitions starting in state i along the time mesh, with \(i\in \{1,2\}\). The likelihood is readily seen to be
Simulations for the above CTMC with \((\lambda _1,\lambda _2)=(1,1)\) and \((\lambda _1,\lambda _2=(2,1)\) were drawn. The prior distributions were \(\lambda \sim \text {Gamma}(1,1)\) and \(\left\{ p_{j,k} \right\} _{j\ne k}\sim \text {Dirichlet}(1)\) for all the experiments. A CTMC was generated until 1000 transitions were obtained and we considered the discrete observations given by times \(\Delta k\) with \(\Delta =1\) and \(1\le k \le \min \left\{ n \, ;\ X(n\Delta )\text { was fully observed}\right\} \).
In Figs. 6 and 7 we compare the true posterior distributions given the simulated data with the Gibbs sampler 1 realizations of \(\lambda _1\) and \(\lambda _2\); whereas in Figures 8 and 9 we do the same experiment for the Gibbs sampler 2 and in 10, 11 for the Metropolis–Hastings sampler. The choice of priors was taken as indicated above for all the three samplers. The variance in the LogNormal proposals for \(\lambda _i\) in the Metropolis–Hastings were set to 0.0025. We ran 50000 iterations of each sampler after having diagnosed stationarity via the potential scale reduction factor of Gelman and Rubin (1992). We highlight that the run time for the Metropolis–Hastings algorithm is significantly faster than the Gibbs samplers as showed in Tables 1 and 2. This is due to the computational cost of simulating the conditioned trajectories between discretely observed times for the CTMC.
Appendix C: Further comparisons for the third simulation study
In this appendix we present further mean generator comparisons between the Gibbs and Metropolis–Hastings samplers for discretely observed CTMCs. Also we present comparisons of generators fitted with the EM algorithm and our Metropolis–Hastings approach.
1.1 C.1: Gibbs and Metropolis–Hastings comparisons
In Figs. 13, 15 and we present the mean generator comparisons for the Gibbs and Metropolis–Hastings samplers with simulation studies determined, respectively, by \(\Delta = 0.5,2\) (Figs. 12, 13, 14, 15, 16, 17, 18, 19, 20).
As mentioned in the main document the entry-wise distance of the Gibbs mean generator with the true generator tends to be greater than the one with respect to the Metropolis–Hasting mean generator; this is particularly illustrated in the mean generators for \(\Delta =0.5\) as shown in Table 1 of the main document in terms of Frobenius distances.
1.2 C.2: EM and Metropolis–Hastings comparison
We use the EM algorithm of the ctmcd R package, see Pfeuffer (2017), to compare with the proposed Metropolis–Hastings algorithm in the context of the third simulation study.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Riva-Palacio, A., Mena, R.H. & Walker, S.G. On the estimation of partially observed continuous-time Markov chains. Comput Stat 38, 1357–1389 (2023). https://doi.org/10.1007/s00180-022-01273-w
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00180-022-01273-w