×

Constrained continuous-time Markov decision processes with average criteria. (English) Zbl 1143.90033

Constrained continuous-time Markov decision processes with a denumerable state space and unbounded reward/cost and transition rates are studied. The criterion to be maximized is the expected average reward, and a constraint is imposed on an expected average cost. The authors give suitable conditions that ensure the existence of a constrained-optimal policy. Moreover, they show that the constrained-optimal policy randomizes between two stationary policies differing in at most one state. A controlled queueing system is used to illustrate the obtained conditions.

MSC:

90C40 Markov and semi-Markov decision processes
93E20 Optimal stochastic control
Full Text: DOI

References:

[1] Altman E (1999) Constrained Markov decision processes. Chapman & Hall/CRC, Boca Raton · Zbl 0963.90068
[2] Alvarez-Mena J, Hernández-Lerma O (2002) Convergence of the optimal values of constrained Markov control processes. Math Meth Oper Res 55(3):461–484 · Zbl 1031.90058 · doi:10.1007/s001860200209
[3] Anderson WJ (1991) Continuous-time Markov chains. Springer, Heidelberg
[4] Beutler FJ, Ross KW (1985) Optimal policies for controlled Markov chains with a constraint. J Math Anal Appl 112:236–252 · Zbl 0581.93067 · doi:10.1016/0022-247X(85)90288-4
[5] Cao XR (1998) The relations among potentials,perturbation analysis, and Markov decision processes. Discrete Event Dyna Syst Theor Applications 8(1):71–87 · Zbl 1126.93332 · doi:10.1023/A:1008260528575
[6] Chung KL (1960) Markov Chains with stationary transition probabilities. Springer, Heidelberg · Zbl 0092.34304
[7] Feinberg EA (2000) Constrained discounted Markov decition processes and hamiltonian cycles. Math Oper Res 25(1):130–140 · Zbl 1073.90567 · doi:10.1287/moor.25.1.130.15210
[8] Guo XP (2000) Constrained nonhomogeneous Markov decision processes with expected total reward criterion. Acta Appl Math Sin (English Ser) 23:230–235
[9] Guo XP, Cao XR (2005) Optimal control of ergodic continous-time Markov chains with average sample-path rewards. SIAM J Control Optim 44(1):29–48 · Zbl 1116.90108 · doi:10.1137/S0363012903420875
[10] Guo XP, Hernández-Lerma O (2003a) Continuous-time controlled Markov chains with discounted rewards. Acta Appl Math 79:195–216 · Zbl 1043.93067 · doi:10.1023/B:ACAP.0000003675.06200.45
[11] Guo XP, Hernández-Lerma O (2003b) Drift and monotonicity conditions for continuous-time controlled Markov chains with an average criterion. IEEE Trans Automatic Control 48(2):236–245 · Zbl 1364.90346
[12] Guo XP, Hernández-Lerma O (2003c) Constrained continuous-time Markov control processes with discounted criteria. Stoch Anal Appl 21(2):379–399 · Zbl 1099.90071
[13] Guo XP, Liu K (2001) A note on optimality conditions for continuous-time Markov decision processes with average cost criterion. IEEE Trans Automatic Control 46:1984–1988 · Zbl 1017.90120 · doi:10.1109/9.975505
[14] Guo XP, Zhu WP (2002a) Denumerable state continuous-time Markov decision processes with unbounded cost and transition rates under average criterion. ANZIAM J 43:541–551 · Zbl 1024.90067
[15] Guo XP, Zhu WP (2002b) In: Hou ZT, Filar JA, Chan AY, Eds Dordrecht (eds) Optimality conditions for continuous-time Markov decision processes with average cost criterion, in Markov processes and controlled Markv chains. Kluwer, The Netherlands
[16] Hernández-Lerma O, Lasserre JB (1999) Further topics on discrete-time Markov control processes. Springer, Heidelberg · Zbl 0928.93002
[17] Hernández-Lerma O (1994) Lectures on continuous-time Markov control processes, vol. 3. Sociedad Matematica Mexicana, Mexico · Zbl 0866.93102
[18] Hernández-Lerma O, Govindan TE (2001) Nonstationary continuous-time Markov control processes with discounted costs on infinite horizon. Acta Appl Math 67:277–293 · Zbl 1160.93397 · doi:10.1023/A:1011970418845
[19] Hernández-Lerma O, González-Hernández J (2000) Constrained Markov control processes in Borel spaces: the discounted case. Math Meth Oper Res 52(2):271–285 · Zbl 1032.90061 · doi:10.1007/s001860000071
[20] Horiguchi M (2001) Markov decision processes with a stopping time constraint. Math Meth Oper Res 53:279–295 · Zbl 1031.90060 · doi:10.1007/PL00003996
[21] Hou ZT, Zou JZ, Zhang HJ, Liu ZM, Xiao GN, Chen AY, Fei LZ (1994) The Q-matrix problems for Markov processes. Science and Technology Press of Hunan, Changsha
[22] Kakumanu P (1972) Non-discounted continuous-time Markov decision processes with countable state space. SIAM J Control 10:210–220 · Zbl 0271.60066 · doi:10.1137/0310016
[23] Kakumanu P (1975) Continuous-time Markov decision processes with average return criterion. J Math Anal Appl 52:173–188 · Zbl 0339.60065 · doi:10.1016/0022-247X(75)90063-3
[24] Lewis ME, Puterman ML (2000) A note on bias optimality in controlled queueing systems. J Appl Prob 37:300–305 · Zbl 1018.90009 · doi:10.1239/jap/1014842288
[25] Piunovskiy AB (1997) Optimal control of random sequences in problems with constraints. Kluwer, Dordrecht · Zbl 0894.93001
[26] Puterman ML (1994) Markov decision processes. Wiley, New York · Zbl 0829.90134
[27] Serin Y, Kulkarni V (2005) Markov decision processes under observability constraints. Math Meth Oper Res 61:311–328 · Zbl 1125.90421 · doi:10.1007/s001860400402
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.