×

Average optimality for continuous-time Markov decision processes in Polish spaces. (English) Zbl 1160.90010

Summary: This paper is devoted to studying the average optimality in continuous-time Markov decision processes with fairly general state and action spaces. The criterion to be maximized is expected average rewards. The transition rates of underlying continuous-time jump Markov processes are allowed to be unbounded, and the reward rates may have neither upper nor lower bounds. We first provide two optimality inequalities with opposed directions, and also give suitable conditions under which the existence of solutions to the two optimality inequalities is ensured. Then, from the two optimality inequalities we prove the existence of optimal (deterministic) stationary policies by using the Dynkin formula. Moreover, we present a “semimartingale characterization” of an optimal stationary policy. Finally, we use a generalized Potlach process with control to illustrate the difference between our conditions and those in the previous literature, and then further apply our results to average optimal control problems of generalized birth-death systems, upwardly skip-free processes and two queueing systems. The approach developed in this paper is slightly different from the “optimality inequality approach” widely used in the previous literature.

MSC:

90C40 Markov and semi-Markov decision processes
93E20 Optimal stochastic control

References:

[1] Anderson, W. J. (1991). Continuous-Time Markov Chains . Springer, New York. · Zbl 0731.60067
[2] Bokar, V. (1989). Optimal Control of Diffusion Processes . Longman Sci. Tech., Harlow. · Zbl 0669.93065
[3] Chen, M. F. (2000). Equivalence of exponential ergodicity and \(L^2\)-exponential convergence for Markov chains. Stochastic Process. Appl. 87 281–279. · Zbl 1045.60078 · doi:10.1016/S0304-4149(99)00114-3
[4] Chen, M. F. (2004). From Markov Chains to Non-Equilibrium Particle Systems , 2nd ed. World Scientific Publishing, River Edge, NJ. · Zbl 1078.60003
[5] Doshi, B. T. (1976). Continuous-time control of Markov processes on an arbitrary state space: Average return criterion. Ann. Statist. 4 1219–1235. · Zbl 0345.93073 · doi:10.1214/aos/1176343653
[6] Dong, Z. Q. (1979). Continuous time Markov decision programming with average reward criterion-countable state and action space. Sci. Sinica SP ISS(II) 141–148.
[7] Down, D., Meyn, S. P. and Tweedie, R. L. (1995). Exponential and uniform ergodicity of Markov processes. Ann. Probab. 23 1671–1691. · Zbl 0852.60075 · doi:10.1214/aop/1176987798
[8] Feller, W. (1940). On the integro-differential equations of purely discontinuous Markoff processes. Trans. Amer. Math. Soc. 48 488–515. JSTOR: · Zbl 0025.34704 · doi:10.2307/1990095
[9] Fleming, W. H. and Soner, H. M. (1993). Controlled Markov Processes and Viscosity Solutions . Springer, New York. · Zbl 0773.60070
[10] Gihman, I. I. and Skorohod, A. V. (1979). Controlled Stochastic Processes . Springer, New York. · Zbl 0411.93029
[11] Guo, X. P. (2003). Continuous-time Markov decison processes with discounted rewards: The case of Polish spaces. Unpublished manuscript.
[12] Guo, X. P. and Cao, X.-R. (2005). Optimal control of ergodic continuous-time Markov chains with average sample-path rewards. SIAM J. Control Optim. 44 29–48. · Zbl 1116.90108 · doi:10.1137/S0363012903420875
[13] Guo, X. P. and Hernández-Lerma, O. (2003). Drift and monotonicity conditions for continuous-time controlled Markov chains an average criterion. IEEE Trans. Automat. Control 48 236–245. · Zbl 1364.90346 · doi:10.1109/TAC.2002.808469
[14] Guo, X. P. and Hernández-Lerma, O. (2003). Continuous-time controlled Markov chains. Ann. Appl. Probab. 13 363–388. · Zbl 1049.60067 · doi:10.1214/aoap/1042765671
[15] Guo, X. P. and Hernández-Lerma, O. (2003). Zero-sum games for continuous-time Markov chains with unbounded transition and average payoff rates. J. Appl. Probab. 40 327–345. · Zbl 1071.91008 · doi:10.1239/jap/1053003547
[16] Guo, X. P. and Liu, K. (2001). A note on optimality conditions for continuous-time Markov decision processes with average cost criterion. IEEE Trans. Automat. Control 46 1984–1989. · Zbl 1017.90120 · doi:10.1109/9.975505
[17] Guo, X. P. and Zhu, W. P. (2002). Denumerable state continuous time Markov decision processes with unbounded cost and transition rates under average criterion. ANZIAM J. 43 541–557. · Zbl 1024.90067
[18] Haviv, M. and Puterman, M. L. (1998). Bias optimality in controlled queuing systems. J. Appl. Probab. 35 136–150. · Zbl 0903.90176 · doi:10.1239/jap/1032192558
[19] Hernández-Lerma, O. (1994). Lectures on Continuous-Time Markov Control Processes . Soc. Mat. Mexicana, México. · Zbl 0866.93102
[20] Hernández-Lerma, O. and Lasserre, J. B. (1996). Discrete-Time Markov Control Processes . Springer, New York. · Zbl 0853.93106
[21] Hernández-Lerma, O. and Lasserre, J. B. (1999). Further Topics on Discrete-Time Markov Control Processes . Springer, New York. · Zbl 0928.93002
[22] Holley, R. and Liggett, T. M. (1981). Generalized Potlach and smoothing processes. Z. Wahrsch. Verw. Gebiete 55 165–195. · Zbl 0441.60096 · doi:10.1007/BF00535158
[23] Howard, R. A. (1960). Dynamic Programming and Markov Processes . Wiley, New York. · Zbl 0091.16001
[24] Kakumanu, P. (1972). Nondiscounted continuous-time Markov decision processes with countable state and action spaces. SIAM J. Control 10 210–220. · Zbl 0271.60066 · doi:10.1137/0310016
[25] Kitaev, M. Y. and Rykov, V. V. (1995). Controlled Queueing Systems . CRC Press, Boca Raton, FL. · Zbl 0876.60077
[26] Lermbersky, M. R. (1974). On maximal rewards and \(\varepsilon\)-optimal policies in continuous time Markov chains. Ann. Statist. 2 159–169. · Zbl 0272.90083 · doi:10.1214/aos/1176342621
[27] Lewis, M. E. and Puterman, M. L. (2000). A note on bias optimality in controlled queueing systems. J. Appl. Probab. 37 300–305. · Zbl 1018.90009 · doi:10.1239/jap/1014842288
[28] Lund, R. B., Meyn, S. P. and Tweedie, R. L. (1996). Computable exponential convergence rates for stochastically ordered Markov processes. Ann. Appl. Probab. 6 218–237. · Zbl 0863.60093 · doi:10.1214/aoap/1034968072
[29] Meyn, S. P. and Tweedie, R. L. (1993). Stability of Markovian processes III: Foster–Lyapunov criteria for continuous-time processes. Adv. in Appl. Probab. 25 518–548. JSTOR: · Zbl 0781.60053 · doi:10.2307/1427522
[30] Miller, R. L. (1968). Finite state continuous time Markov decision processes with an infinite planning horizon. J. Math. Anal. Appl. 22 552–569. · Zbl 0157.50301 · doi:10.1016/0022-247X(68)90194-7
[31] Puterman, M. L. (1994). Markov Decision Processes . Wiley, New York. · Zbl 0829.90134
[32] Rao, M. M. (1995). Stochastic Processes : General Theory . Kluwer, Dordrecht. · Zbl 0841.60002
[33] Rieder, U. (1978). Measurable selection theorems for optimization problems. Manuscripta Math. 24 115–131. · Zbl 0385.28005 · doi:10.1007/BF01168566
[34] Sennott, L. I. (1999). Stochastic Dynamic Programming and the Control of Queueing System . Wiley, New York. · Zbl 0997.93503
[35] Song, J. S. (1987). Continuous time Markov decision programming with non-uniformly bounded transition rates. Sciential Sinica 12 1258–1267.
[36] Tweedie, R. L. (1981). Criteria for ergodicity, exponential ergodicity and strong ergodicity of Markov processes. J. Appl. Probab. 18 122–130. JSTOR: · Zbl 0458.60070 · doi:10.2307/3213172
[37] Widder, D. V. (1941). The Laplace Transform. Princeton Univ. Press. · Zbl 0063.08245
[38] Williams, D. (1979). Diffusions , Markov Processes , and Martingales . Wiley, New York. · Zbl 0402.60003
[39] Yushkevich, A. A. and Feinberg, E. A. (1979). On homogeneous Markov model with continuous time and finite or countable state space. Theory Probab. Appl. 24 156–161. · Zbl 0437.93033 · doi:10.1137/1124014
[40] Zeifman, A. I. (1991). Some estimates of the rate of convergence for birth and death processes. J. Appl. Probab. 28 268–277. JSTOR: · Zbl 0738.60088 · doi:10.2307/3214865
[41] Zheng, S. H. (1991). Continuous time Markov decision programming with average reward criterion and unbounded reward rates. Acta Math. Appl. Sinica 7 6–16. · Zbl 0752.90083 · doi:10.1007/BF02080199
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.