×

Reduction of discounted continuous-time MDPs with unbounded jump and reward rates to discrete-time total-reward mdps. (English) Zbl 1374.90402

Hernández-Hernández, Daniel (ed.) et al., Optimization, control, and applications of stochastic systems. In honor of Onésimo Hernández-Lerma. Boston, MA: Birkhäuser (ISBN 978-0-8176-8336-8/hbk; 978-0-8176-8337-5/ebook). Systems and Control: Foundations and Applications, 77-97 (2012).
Summary: This chapter discusses a reduction of discounted continuous-time Markov decision processes (CTMDPs) to discrete-time Markov decision processes (MDPs). This reduction is based on the equivalence of a randomized policy that chooses actions only at jump epochs to a nonrandomized policy that can switch actions between jumps. For discounted CTMDPs with bounded jump rates, this reduction was introduced by the author in 2004 as a reduction to discounted MDPs. Here we show that this reduction also holds for unbounded jump and reward rates, but the corresponding MDP may not be discounted. However, the analysis of the equivalent total-reward MDP leads to the description of optimal policies for the CTMDP and provides methods for their computation.
For the entire collection see [Zbl 1253.00012].

MSC:

90C40 Markov and semi-Markov decision processes
Full Text: DOI