On the estimation of partially observed continuous-time Markov chains

379 Accesses
Explore all metrics

Abstract

Motivated by the increasing use of discrete-state Markov processes across applied disciplines, a Metropolis–Hastings sampling algorithm is proposed for a partially observed process. Current approaches, both classical and Bayesian, have relied on imputing the missing parts of the process and working with a complete likelihood. However, from a Bayesian perspective, the use of latent variables is not necessary and exploiting the observed likelihood function, combined with a suitable Markov chain Monte Carlo method, results in an accurate and efficient approach. A comprehensive comparison with simulated and real data sets demonstrate our approach when compared with alternatives available in the literature.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Bayesian inference for continuous-time hidden Markov models with an unknown number of states

Article Open access 10 August 2021

Markov Chain Monte Carlo Algorithms for Bayesian Computation, a Survey and Some Generalisation

Maximum spacing estimation for continuous time Markov chains and semi-Markov processes

Article 15 February 2021

References

Al-Mohy AH, Higham NJ (2011) Computing the action of the matrix exponential, with an application to exponential integrators. SIAM J Sci Comput 33(2):488–511
Article MathSciNet MATH Google Scholar
Amoros R, King R, Toyoda H, Kumada T, Johnson PJ, Bird TG (2019) A continuous-time hidden Markov model for cancer surveillance using serum biomarkers with application to hepatocellular carcinoma. Metron 77(2):67–86
Article MathSciNet MATH Google Scholar
Bladt M, Sorensen M (2005) Statistical inference for discretely observed Markov jump processes. J Roy Stat Soc B 67:395–410
Article MathSciNet MATH Google Scholar
dos Reis G, Smith G (2018) Robust and consistent estimation of generators in credit risk. Quant Financ 18:983–1001
Article MathSciNet MATH Google Scholar
Fearnhead P, Sherlock C (2006) An exact Gibbs sampler for the Markov-modulated Poisson process. J R Stat Soc: Ser B (Stat Methodol) 68:767–784
Article MathSciNet MATH Google Scholar
Fukaya K, Royle JA (2013) Markov models for community dynamics allowing for observation error. Ecology 94:2670–2677
Article Google Scholar
Gelman A, Rubin DB (1992) Inference from iterative simulation using multiple sequences. Stat Sci 7(4):457–472
Article MATH Google Scholar
Georgoulas A, Hillston J, Sanguinetti G (2017) Unbiased Bayesian inference for population Markov jump processes via random truncations. Stat Comput 27(4):991–1002
Article MathSciNet MATH Google Scholar
Gillespie DT (1977) Exact stochastic simulation of coupled chemical reactions. J Phys Chem 81(25):2340–2361
Article Google Scholar
Goulet V, Dutang C, Maechler M, Firth D, Shapira M, Stadelmann M (2021) Package ‘expm’
Grimmett GR, Stirzaker DR (1982) Probability and Random Processes. Oxford University Press
Higham NJ (2005) The scaling and squaring method for the matrix exponential revisited. SIAM J Matrix Anal Appl 26:1179–1193
Article MathSciNet MATH Google Scholar
Inamura Y (2006) Estimating continuous time transition matrices from discretely observed data. Bank of Japan, No.06-E07
Israel RB, Rosenthal JS, Wei JS (2001) Finding generators for Markov chains via empirical transition matrices, with applications to credit ratings. Math Financ 11:245–265
Article MathSciNet MATH Google Scholar
Norris JR (1998) Markov Chains. Cambridge University Press
Pardoux E (2008) Markov processes and applications. Algorithms, networks, genome and finance. Wiley
Pfeuffer M, Möstel L, Fischer M (2019) An extended likelihood framework for modelling discretely observed credit rating transitions. Quant Financ 19:93–104
Article MathSciNet MATH Google Scholar
Pfeuffepdr M (2017) ctmcd: An R Package for Estimating the Parameters of a Continuous-Time Markov Chain from Discrete-Time Data. R J 19:127–141
Article Google Scholar
Plummer M (2003) JAGS: A program for analysis of Bayesian graphical models using Gibbs sampling. In: Proceedings of the 3rd international workshop on distributed statistical computing, vol 124, No. 125.10, pp 1–10
Rao V, Teh YW (2013) Fast MCMC sampling for Markov jump processes and extensions. J Mach Learn Res 14(1):3295–3320
MathSciNet MATH Google Scholar
Sherlock C, Fearnhead P, Roberts GO (2010) The random walk Metropolis: linking theory and practice through a case study. Stat Sci 25(2):172–190
Article MathSciNet MATH Google Scholar
Sauer M, Stannat W (2016) Reliability of signal transmission in stochastic nerve axon equations. J Comput Neurosci 40:103–111
Article MathSciNet MATH Google Scholar
Van Kampen NG (2007) Stochastic processes in physics and chemistry. North–Holland
Virtanen P, Gommers R, Oliphant TE, Haberland M, Reddy T, Cournapeau D, van Mulbregt P (2020) SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat Methods 17(3):261-272
Zhao T, Wang Z, Cumberworth A, Gsponer J, de Freitas N, Bouchard-Côté A (2016) Bayesian analysis of continuous time Markov chains with application to phylogenetic modelling. Bayesian Anal 11(4):1203–1237
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

The authors are grateful for the support CONTEX project 2018-9 and PAPIIT-UNAM project IG100221. The authors are also grateful for the comments and suggestions of a referee which has greatly improved the paper.

Author information

Authors and Affiliations

IIMAS, UNAM, Mexico City, Mexico
Alan Riva-Palacio & Ramsés H. Mena
University of Texas at Austin, Austin, USA
Stephen G. Walker

Authors

Alan Riva-Palacio
View author publications
You can also search for this author in PubMed Google Scholar
Ramsés H. Mena
View author publications
You can also search for this author in PubMed Google Scholar
Stephen G. Walker
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alan Riva-Palacio.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The authors gratefully acknowledge the support of project CONTEX 2018-9B and PAPIIT-UNAM IG100221.

Appendices

Appendix A: Metropolis-Hastings algorithm (5) details

The Metropolis–Hastings moves used in our work are the following:

For $\lambda \in \Lambda = \left\{ \lambda _1,\ldots ,\lambda _m\right\} $ let $g\,:\, {\mathbb {R}}\rightarrow {\mathbb {R}}^+$ be given by $g(x)=e^x$ and propose moves from $\lambda $ to ${\tilde{\lambda }}$ by
$$\begin{aligned} {\tilde{\lambda }}= g( g^{-1}(\lambda ) +Z ) \end{aligned}$$
with $Z\sim \text {Norma}(0,1)$ so ${\tilde{\lambda }}$ has a LogNormal distribution. The corresponding transition kernel is given by $q({\tilde{\lambda }}\,|\,\lambda )\propto e^{-0.5(\log (\lambda )-\log ({\tilde{\lambda }}))^2}{\tilde{\lambda }}^{-1}$.
For transition matrix P with no zeros in off-diagonal entries and

$\pmb {p}\in \left\{ \left( p_{j,1},\ldots ,p_{j,j-1},p_{j,j+1},\ldots ,p_{j,m} \right) \, : \, 1 \le j \le m \right\} $, let $c>0$ and propose
$$\begin{aligned} \tilde{\pmb {p}}|\pmb {p}\sim \text {Dirichlet}(\pmb {1}+c\pmb {p}) \end{aligned}$$
with $\pmb {1}=(1,\ldots ,1)$ so $\tilde{\pmb {p}}$ has mode $\pmb {p}$.
For transition matrix P with zeros in off-diagonal entries and

$p\in \left\{ p_{j,k} \, : \, 1 \le j\le m,\,1 \le k\le m,\, j\ne k\right\} $, let $g\,:\, {\mathbb {R}}\rightarrow {\mathbb {R}}^+$ be a standard logistic function $g(x)=e^x/(1+e^x)$ and propose moves from p to ${\tilde{p}}$ by
$$\begin{aligned} {\tilde{p}}= g( g^{-1}(p) +Z ) \end{aligned}$$
with $Z\sim \text {Normal}(0,1)$ so ${\tilde{p}}$ has a LogitNormal distribution. Denote $\pmb {w}$ as the vector $\pmb {p}$ with entry value p changed to ${\tilde{p}}$. Finally we do a normalization
$$\begin{aligned} \tilde{\pmb {p}} = \pmb {w}/\sum _{k=1}^m w. \end{aligned}$$
The transformation $ f(x_1,\ldots ,x_m) = \left( \frac{x_1}{\sum _{i=1}^m x_i} , \ldots ,\frac{x_{m-1}}{\sum _{i=1}^m x_i},\sum _{i=1}^m x_i \right) $ is known to have Jacobian $\left( \sum _{i=1}^m x_i \right) ^{-m+1}$. So the corresponding transition kernel is given by $q(\tilde{\pmb {p}}\,|\,\pmb {p})\propto \exp \left( -0.5(\text {logit}(p_k)-\text {logit}({\tilde{p}}))^2\right) \frac{ \left( {\tilde{p}} + \sum _{i\ne k}p_i \right) ^{m-1}}{{\tilde{p}}(1-{\tilde{p}})}$. Note that we do not marginalize the random variable $S=\sum _{i=1}^m w_i$ so we have to draw simulations of the auxiliary variable ${\tilde{p}}$ to evaluate the above conditional density.

Appendix B: Second simulation study details

For a generator matrix

$$\begin{aligned} G = \begin{pmatrix} -\lambda _1 & \lambda _1 \\ \lambda _2 & - \lambda _2 \end{pmatrix} \end{aligned}$$

with $\lambda _1,\lambda _2>0$, we have explicitly that

$$\begin{aligned} Q(t) = \exp (tG) = \begin{pmatrix} \frac{\lambda _2}{\lambda _1 + \lambda _2} + \frac{\lambda _1}{\lambda _1 + \lambda _2}e^{-(\lambda _1 + \lambda _2) t} & \frac{\lambda _1}{\lambda _1 + \lambda _2} - \frac{\lambda _1}{\lambda _1 + \lambda _2}e^{-(\lambda _1 + \lambda _2) t} \\ & & \\ \frac{\lambda _2}{\lambda _1 + \lambda _2} - \frac{\lambda _2}{\lambda _1 + \lambda _2}e^{-(\lambda _1 + \lambda _2) t} & \frac{\lambda _1}{\lambda _1 + \lambda _2} + \frac{\lambda _2}{\lambda _1 + \lambda _2}e^{-(\lambda _1 + \lambda _2) t} \end{pmatrix}. \end{aligned}$$

For discrete observation over a time mesh $\Delta k$, $\Delta \in {\mathbb {R}}^+$, and $k \in \{1,\ldots , n\}$ for some $n\in {\mathbb {N}}$ of the Markov chain associated G; let $n^{eq}_i$ be the number of times the state remains equal for transitions starting in state i and $n^{ch}_i$ the number of times the state changes for transitions starting in state i along the time mesh, with $i\in \{1,2\}$. The likelihood is readily seen to be

$$\begin{aligned} {\mathcal {L}}(\lambda )&= \left( \frac{\lambda _2}{\lambda _1 + \lambda _2} + \frac{\lambda _1}{\lambda _1 + \lambda _2}e^{-(\lambda _1 + \lambda _2) \Delta } \right) ^{n^{eq}_1} \left( \frac{\lambda _1}{\lambda _1 + \lambda _2} + \frac{\lambda _2}{\lambda _1 + \lambda _2}e^{-(\lambda _1 + \lambda _2) \Delta } \right) ^{n^{eq}_2} \\&\times \left( \frac{\lambda _1}{\lambda _1 + \lambda _2} - \frac{\lambda _1}{\lambda _1 + \lambda _2}e^{-(\lambda _1 + \lambda _2) \Delta } \right) ^{n^{ch}_1} \left( \frac{\lambda _2}{\lambda _1 + \lambda _2} - \frac{\lambda _2}{\lambda _1 + \lambda _2}e^{-(\lambda _1 + \lambda _2) \Delta } \right) ^{n^{ch}_2}. \end{aligned}$$

Simulations for the above CTMC with $(\lambda _1,\lambda _2)=(1,1)$ and $(\lambda _1,\lambda _2=(2,1)$ were drawn. The prior distributions were $\lambda \sim \text {Gamma}(1,1)$ and $\left\{ p_{j,k} \right\} _{j\ne k}\sim \text {Dirichlet}(1)$ for all the experiments. A CTMC was generated until 1000 transitions were obtained and we considered the discrete observations given by times $\Delta k$ with $\Delta =1$ and $1\le k \le \min \left\{ n \, ;\ X(n\Delta )\text { was fully observed}\right\} $.

In Figs. 6 and 7 we compare the true posterior distributions given the simulated data with the Gibbs sampler 1 realizations of $\lambda _1$ and $\lambda _2$; whereas in Figures 8 and 9 we do the same experiment for the Gibbs sampler 2 and in 10, 11 for the Metropolis–Hastings sampler. The choice of priors was taken as indicated above for all the three samplers. The variance in the LogNormal proposals for $\lambda _i$ in the Metropolis–Hastings were set to 0.0025. We ran 50000 iterations of each sampler after having diagnosed stationarity via the potential scale reduction factor of Gelman and Rubin (1992). We highlight that the run time for the Metropolis–Hastings algorithm is significantly faster than the Gibbs samplers as showed in Tables 1 and 2. This is due to the computational cost of simulating the conditioned trajectories between discretely observed times for the CTMC.

Appendix C: Further comparisons for the third simulation study

In this appendix we present further mean generator comparisons between the Gibbs and Metropolis–Hastings samplers for discretely observed CTMCs. Also we present comparisons of generators fitted with the EM algorithm and our Metropolis–Hastings approach.

1.1 C.1: Gibbs and Metropolis–Hastings comparisons

In Figs. 13, 15 and we present the mean generator comparisons for the Gibbs and Metropolis–Hastings samplers with simulation studies determined, respectively, by $\Delta = 0.5,2$ (Figs. 12, 13, 14, 15, 16, 17, 18, 19, 20).

As mentioned in the main document the entry-wise distance of the Gibbs mean generator with the true generator tends to be greater than the one with respect to the Metropolis–Hasting mean generator; this is particularly illustrated in the mean generators for $\Delta =0.5$ as shown in Table 1 of the main document in terms of Frobenius distances.

1.2 C.2: EM and Metropolis–Hastings comparison

We use the EM algorithm of the ctmcd R package, see Pfeuffer (2017), to compare with the proposed Metropolis–Hastings algorithm in the context of the third simulation study.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Riva-Palacio, A., Mena, R.H. & Walker, S.G. On the estimation of partially observed continuous-time Markov chains. Comput Stat 38, 1357–1389 (2023). https://doi.org/10.1007/s00180-022-01273-w

Download citation

Received: 17 December 2020
Accepted: 05 August 2022
Published: 18 August 2022
Issue Date: September 2023
DOI: https://doi.org/10.1007/s00180-022-01273-w