Abstract
This paper studies denumerable continuous-time Markov decision processes with expected total reward criteria. The authors first study the unconstrained model with possible unbounded transition rates, and give suitable conditions on the controlled system’s primitive data under which the authors show the existence of a solution to the total reward optimality equation and also the existence of an optimal stationary policy. Then, the authors impose a constraint on an expected total cost, and consider the associated constrained model. Basing on the results about the unconstrained model and using the Lagrange multipliers approach, the authors prove the existence of constrained-optimal policies under some additional conditions. Finally, the authors apply the results to controlled queueing systems.
Similar content being viewed by others
References
X. P. Guo, O. Hernández-Lerma, and T. Prieto-Rumeau, A survey of recent results on continuoustime Markov decision processes, TOP, 2006, 14(2): 177–246.
X. P. Guo and O. Hernández-Lerma, Continuous-time controlled Markov chains, Ann. Appl. Probab., 2003a, 13: 363–388.
X. P. Guo and O. Hernández-Lerma, Continuous-time controlled Markov chains with discounted rewards, Acta. Appl. Math., 2003, 79: 195–216.
X. P. Guo and O. Hernández-Lerma, Constrained continuous-time Markov controlled processes with discounted criteria, Stochastic Anal. Appl., 2003, 21(2): 379–399.
O. Hernández-Lerma and T. E. Govindan, Nonstationary continuous-time Markov control processes with discounted costs on infinite horizon, Acta. Appl. Math., 2001, 67: 277–293.
M. L. Puterman, Markov Decision Processes, Wiley, New York, 1994.
X. P. Guo and X. R. Cao, Optimal control of ergodic continues-time Markov chains with average sample-path rewards, SIAM J. Control Optim., 2005, 44(1): 29–48.
X. P. Guo and O. Hernández-Lerma, Drift and monotonicity conditions for continuous-time controlled Markov chains with an average criterion, IEEE Trans. Automat. Control., 2003, 48(2): 236–245.
X. P. Guo and K. Liu, A note on optimality conditions for continuous-time Markov decision processes with average cost criterion, IEEE Trans. Automat. Control, 2001, 46: 1984–1988.
X. P. Guo and W. P. Zhu, Denumerable state continuous-time Markov decision processes with unbounded cost and transition rates under average criterion, Anziam J., 2002, 43: 541–551.
P. Kakumanu, Non-discounted continuous-time Markov decision processes with countable state space, SIAM J. Control, 1972, 10: 210–220.
M. E. Lewis and M. L. Puterman, A note on bias optimality in controlled queueing systems, J. Appl. Probab., 2000, 37: 300–305.
T. Prieto-Rumeau and O. Hernández-Lerma, Ergodic control of continuous-time Markov chains with pathwise constraints, SIAM J. Control Optim., 2008, 47(4): 1888–1908.
T. Prieto-Rumeau, Blackwell optimality in the class of Markov policies for continuous-time controlled Markov chains, Acta Appl. Math., 2006, 92: 77–96.
T. Prieto-Rumeau and O. Hernández-Lerma, The Laurent series, sensitive discount and Blackwell optimality for continuous-time controlled Markov chains, Math. Meth. Oper. Res., 2005a, 61: 123–145.
T. Prieto-Rumeau and O. Hernández-Lerma, Bias optimality for continuous-time controlled Markov chains, SIAM J. Control Optim., 2006, 45: 51–73.
D. P. Bertsekas, Dynamic Programming and Optimal Control, 2nd Edition, Athena Scientific, Belmont, 2001.
O. Hernández-Lerma and J. B. Lasserre, Further Topics on Discrete-Time Markov Control Processes, Springer-Verlag, New York, 1999.
E. Altman, Constrained Markov Decision Processes, Chapman and Hall/CRC, Boca Raton, FL, 1999.
J. Alvarez-Mena and O. Hernández-Lerma, Convergence of the optimal values of constrained Markov control processes, Math. Meth. Oper. Res., 2002, 55(3): 461–484.
A. B. Piunovskiy, Optimal Control of Random Sequences in Problems with Constraints, Dordrecht, Kluwer, 1997.
Y. Serin and V. Kulkarni, Markov decision processes under observability constraints, Math. Meth. Oper. Res., 2005, 61: 311–328.
W. J. Anderson, Continuous-Time Markov Chains, Springer-Verlag, New York, 1991.
K. L. Chung, Markov Chains with Stationary Transition Probabilities, Springer-Verlag, Berlin, 1960.
L. E. Ye, X. P. Guo, and O. Hernández-Lerma, Existence and reguarity of a nonhomogeneous transition matrix under measurability conditions, J. Theor. Probab., 2008, 21: 604–627.
F. J. Beutler and K. W. Ross, Optimal policies for controlled Markov chains with a constraint, J. Math. Anal. Appl., 1985, 112: 236–252.
X. P. Guo, Constrained nonhomogeneous Markov decision processes with expected total reward criterion, Acta Appl. Math. Sin., English Ser., 2000, 23: 230–235.
L. L. Zhang and X. P. Guo, Constrained continuous-time Markov decision processes with average criteria, Math. Meth. Oper. Res., 2007, 67: 323–340.
Author information
Authors and Affiliations
Corresponding author
Additional information
This research is supported by the National Natural Science Foundation of China under Grant Nos. 10925107 and 60874004.
This paper was recommended for publication by Editor Guohua ZOU.
Rights and permissions
About this article
Cite this article
Guo, X., Zhang, L. Total reward criteria for unconstrained/constrained continuous-time Markov decision processes. J Syst Sci Complex 24, 491–505 (2011). https://doi.org/10.1007/s11424-011-8004-9
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11424-011-8004-9