×

Optimal decisions for continuous time Markov decision processes over finite planning horizons. (English) Zbl 1391.90631

Summary: The computation of \(\epsilon\)-optimal policies for continuous time Markov decision processes (CTMDPs) over finite time intervals is a sophisticated problem because the optimal policy may change at arbitrary times. Numerical algorithms based on time discretization or uniformization have been proposed for the computation of optimal policies. The uniformization based algorithm has shown to be more reliable and often also more efficient but is currently only available for processes where the gain or reward does not depend on the decision taken in a state. In this paper, we present two new uniformization based algorithms for computing \(\epsilon\)-optimal policies for CTMDPs with decision dependent rewards over a finite time horizon. Due to a new and tighter upper bound the newly proposed algorithms cannot only be applied for decision dependent rewards, they also outperform the available approach for rewards that do not depend on the decision. In particular, for models where the policy only rarely changes, optimal policies can be computed much faster.

MSC:

90C40 Markov and semi-Markov decision processes
93E20 Optimal stochastic control
60K25 Queueing theory (aspects of probability theory)
65C40 Numerical analysis or methods applied to Markov chains
Full Text: DOI

References:

[1] Bertsekas, D. P., Dynamic programming and optimal control, vol. I, (2005), Athena Scientific Belmont, Massachusetts · Zbl 1125.90056
[2] Bertsekas, D. P., Dynamic programming and optimal control, vol. II, (2007), Athena Scientific Belmont, Massachusetts
[3] Puterman, M. L., Markov decision processes, (2005), Wiley Hoboken, New Jersey · Zbl 1184.90170
[4] Jensen, A., Markoff chains as an aid in the study of markoff processes, skand, Aktuarietidskr, 36, 87-91, (1953) · Zbl 0051.35607
[5] Gross, D.; Miller, D., The randomization technique as a modeling tool and solution procedure for transient Markov processes, Oper Res, 32, 2, 926-944, (1984) · Zbl 0546.90038
[6] Martin-Löfs, A., Optimal control of a continuous-time Markov chain with periodic transition probabilities, Oper Res, 15, 872-881, (1967) · Zbl 0149.38104
[7] Miller, B. L., Finite state continuous time Markov decision processes with a finite planning horizon, SIAM J Control, 6, 2, 266-280, (1968) · Zbl 0162.23302
[8] Buchholz, P.; Schulz, I., Numerical analysis of continuous time Markov decision processes over finite horizons, Comput OR, 38, 3, 651-659, (2011) · Zbl 1206.90206
[9] Yasuda, M., On the existence of optimal control in continuous time Markov decision processes, Bull Math Stat, 15, 7-17, (1972) · Zbl 0242.93062
[10] Lembersky, M. R., On maximal rewards and ϵ-optimal policies in continuous time Markov decision chains, Ann Stat, 2, 1, 159-169, (1974) · Zbl 0272.90083
[11] Stewart, W. J., Introduction to the numerical solution of Markov chains, (1994), Princeton University Press Princeton, New Jersey · Zbl 0821.65099
[12] Rindos, A.; Woolet, S.; Viniotis, I.; Trivedi, K., Exact methods for the transient analysis of nonhomogeneous continuous time Markov chains, (Stewart, W. J., Computations with Markov chains, (1995), Kluwer Academic Publishers Boston, London, Dordrecht), 121-133 · Zbl 0862.60060
[13] Arns, M.; Buchholz, P.; Panchenko, A., On the numerical analysis of inhomogeneous continuous time Markov chains, INFORMS J Comput, 22, 3, 416-432, (2010) · Zbl 1243.65015
[14] Lippman, S. A., Applying a new device in the optimization of exponential queuing systems, Oper Res, 23, 4, 687-710, (1975) · Zbl 0312.60048
[15] Lippman, S. A., Countable-state, continuous-time dynamic programming with structure, Oper Res, 24, 3, 477-490, (1976) · Zbl 0353.90092
[16] Bhatnagar, S.; Abdulla, M. S., Simulation-based optimization algorithms for finite-horizon Markov decision processes, Simulation, 84, 12, 577-600, (2008)
[17] Baier, C.; Haverkort, B. R.; Hermanns, H.; Katoen, J., Model-checking algorithms for continuous-time Markov chains, IEEE Trans Softw Eng, 29, 6, 524-541, (2003)
[18] Baier, C.; Hermanns, H.; Katoen, J. P.; Haverkort, B. R., Efficient computation of time-bounded reachability probabilities in uniform continuous-time Markov decision processes, Theor Comput Sci, 345, 1, 2-26, (2005) · Zbl 1081.90066
[19] Buchholz P, Hahn EM, Hermanns H, Zhang L. Model checking algorithms for CTMDPs. In: Computer aided verification—23rd international conference, CAV 2011, Snowbird, UT, USA, July 14-20, 2011. Proceedings; 2011. p. 225-242.
[20] Fearnley, J.; Rabe, M. N.; Schewe, S.; Zhang, L., Efficient approximation of optimal control for continuous-time Markov games, Inf Comput, 247, 106-129, (2016) · Zbl 1337.91015
[21] Gallager, R., Stochastic processes theory for applications, (2014), Cambridge University Press Cambridge
[22] Serfozo, R. F., An equivalence between continuous and discrete time Markov decision processes, Oper Res, 27, 3, 616-620, (1979) · Zbl 0413.90079
[23] Press, W. H.; Teukolsky, S. A.; Vetterling, W. T.; Flannery, B. P., Numerical recipes in C, (1992), Cambridge University Press Cambridge · Zbl 0778.65003
[24] Gandhi, A.; Gupta, V.; Harchol-Balter, M.; Kozuch, M. A., Optimality analysis of energy-performance trade-off for server farm management, Perform Eval, 67, 11, 1155-1171, (2010)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.