×

Total reward criteria for unconstrained/constrained continuous-time Markov decision processes. (English) Zbl 1254.90290

Summary: This paper studies denumerable continuous-time Markov decision processes with expected total reward criteria. The authors first study the unconstrained model with possible unbounded transition rates, and give suitable conditions on the controlled system’s primitive data under which the authors show the existence of a solution to the total reward optimality equation and also the existence of an optimal stationary policy. Then, the authors impose a constraint on an expected total cost, and consider the associated constrained model. Basing on the results about the unconstrained model and using the Lagrange multipliers approach, the authors prove the existence of constrained-optimal policies under some additional conditions. Finally, the authors apply the results to controlled queueing systems.

MSC:

90C40 Markov and semi-Markov decision processes
Full Text: DOI

References:

[1] X. P. Guo, O. Hernández-Lerma, and T. Prieto-Rumeau, A survey of recent results on continuoustime Markov decision processes, TOP, 2006, 14(2): 177–246. · Zbl 1278.90427 · doi:10.1007/BF02837562
[2] X. P. Guo and O. Hernández-Lerma, Continuous-time controlled Markov chains, Ann. Appl. Probab., 2003a, 13: 363–388. · Zbl 1049.60067 · doi:10.1214/aoap/1042765671
[3] X. P. Guo and O. Hernández-Lerma, Continuous-time controlled Markov chains with discounted rewards, Acta. Appl. Math., 2003, 79: 195–216. · Zbl 1043.93067 · doi:10.1023/B:ACAP.0000003675.06200.45
[4] X. P. Guo and O. Hernández-Lerma, Constrained continuous-time Markov controlled processes with discounted criteria, Stochastic Anal. Appl., 2003, 21(2): 379–399. · Zbl 1099.90071 · doi:10.1081/SAP-120019291
[5] O. Hernández-Lerma and T. E. Govindan, Nonstationary continuous-time Markov control processes with discounted costs on infinite horizon, Acta. Appl. Math., 2001, 67: 277–293. · Zbl 1160.93397 · doi:10.1023/A:1011970418845
[6] M. L. Puterman, Markov Decision Processes, Wiley, New York, 1994.
[7] X. P. Guo and X. R. Cao, Optimal control of ergodic continues-time Markov chains with average sample-path rewards, SIAM J. Control Optim., 2005, 44(1): 29–48. · Zbl 1116.90108 · doi:10.1137/S0363012903420875
[8] X. P. Guo and O. Hernández-Lerma, Drift and monotonicity conditions for continuous-time controlled Markov chains with an average criterion, IEEE Trans. Automat. Control., 2003, 48(2): 236–245. · Zbl 1364.90346 · doi:10.1109/TAC.2002.808469
[9] X. P. Guo and K. Liu, A note on optimality conditions for continuous-time Markov decision processes with average cost criterion, IEEE Trans. Automat. Control, 2001, 46: 1984–1988. · Zbl 1017.90120 · doi:10.1109/9.975505
[10] X. P. Guo and W. P. Zhu, Denumerable state continuous-time Markov decision processes with unbounded cost and transition rates under average criterion, Anziam J., 2002, 43: 541–551. · Zbl 1024.90067
[11] P. Kakumanu, Non-discounted continuous-time Markov decision processes with countable state space, SIAM J. Control, 1972, 10: 210–220. · Zbl 0271.60066 · doi:10.1137/0310016
[12] M. E. Lewis and M. L. Puterman, A note on bias optimality in controlled queueing systems, J. Appl. Probab., 2000, 37: 300–305. · Zbl 1018.90009 · doi:10.1239/jap/1014842288
[13] T. Prieto-Rumeau and O. Hernández-Lerma, Ergodic control of continuous-time Markov chains with pathwise constraints, SIAM J. Control Optim., 2008, 47(4): 1888–1908. · Zbl 1165.93040 · doi:10.1137/060668857
[14] T. Prieto-Rumeau, Blackwell optimality in the class of Markov policies for continuous-time controlled Markov chains, Acta Appl. Math., 2006, 92: 77–96. · Zbl 1108.93080 · doi:10.1007/s10440-006-9060-3
[15] T. Prieto-Rumeau and O. Hernández-Lerma, The Laurent series, sensitive discount and Blackwell optimality for continuous-time controlled Markov chains, Math. Meth. Oper. Res., 2005a, 61: 123–145. · Zbl 1077.93055 · doi:10.1007/s001860400393
[16] T. Prieto-Rumeau and O. Hernández-Lerma, Bias optimality for continuous-time controlled Markov chains, SIAM J. Control Optim., 2006, 45: 51–73. · Zbl 1134.93049 · doi:10.1137/S036301290343432
[17] D. P. Bertsekas, Dynamic Programming and Optimal Control, 2nd Edition, Athena Scientific, Belmont, 2001. · Zbl 1083.90044
[18] O. Hernández-Lerma and J. B. Lasserre, Further Topics on Discrete-Time Markov Control Processes, Springer-Verlag, New York, 1999. · Zbl 0928.93002
[19] E. Altman, Constrained Markov Decision Processes, Chapman and Hall/CRC, Boca Raton, FL, 1999. · Zbl 0963.90068
[20] J. Alvarez-Mena and O. Hernández-Lerma, Convergence of the optimal values of constrained Markov control processes, Math. Meth. Oper. Res., 2002, 55(3): 461–484. · Zbl 1031.90058 · doi:10.1007/s001860200209
[21] A. B. Piunovskiy, Optimal Control of Random Sequences in Problems with Constraints, Dordrecht, Kluwer, 1997. · Zbl 0894.93001
[22] Y. Serin and V. Kulkarni, Markov decision processes under observability constraints, Math. Meth. Oper. Res., 2005, 61: 311–328. · Zbl 1125.90421 · doi:10.1007/s001860400402
[23] W. J. Anderson, Continuous-Time Markov Chains, Springer-Verlag, New York, 1991. · Zbl 0731.60067
[24] K. L. Chung, Markov Chains with Stationary Transition Probabilities, Springer-Verlag, Berlin, 1960. · Zbl 0092.34304
[25] L. E. Ye, X. P. Guo, and O. Hernández-Lerma, Existence and reguarity of a nonhomogeneous transition matrix under measurability conditions, J. Theor. Probab., 2008, 21: 604–627. · Zbl 1147.60050 · doi:10.1007/s10959-008-0163-9
[26] F. J. Beutler and K. W. Ross, Optimal policies for controlled Markov chains with a constraint, J. Math. Anal. Appl., 1985, 112: 236–252. · Zbl 0581.93067 · doi:10.1016/0022-247X(85)90288-4
[27] X. P. Guo, Constrained nonhomogeneous Markov decision processes with expected total reward criterion, Acta Appl. Math. Sin., English Ser., 2000, 23: 230–235.
[28] L. L. Zhang and X. P. Guo, Constrained continuous-time Markov decision processes with average criteria, Math. Meth. Oper. Res., 2007, 67: 323–340. · Zbl 1143.90033 · doi:10.1007/s00186-007-0154-0
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.