×

Discounted continuous-time constrained Markov decision processes in Polish spaces. (English) Zbl 1258.90104

Summary: This paper is devoted to studying constrained continuous-time Markov decision processes (MDPs) in the class of randomized policies depending on state histories. The transition rates may be unbounded, the reward and costs are admitted to be unbounded from above and from below, and the state and action spaces are Polish spaces. The optimality criterion to be maximized is the expected discounted rewards, and the constraints can be imposed on the expected discounted costs. First, we give conditions for the nonexplosion of underlying processes and the finiteness of the expected discounted rewards/costs. Second, using a technique of occupation measures, we prove that the constrained optimality of continuous-time MDPs can be transformed to an equivalent (optimality) problem over a class of probability measures. Based on the equivalent problem and a so-called \(\overline w\)-weak convergence of probability measures developed in this paper, we show the existence of a constrained optimal policy. Third, by providing a linear programming formulation of the equivalent problem, we show the solvability of constrained optimal policies. Finally, we use two computable examples to illustrate our main results.

MSC:

90C40 Markov and semi-Markov decision processes
60J27 Continuous-time Markov processes on discrete state spaces

References:

[1] Altman, E. (1999). Constrained Markov Decision Processes . Chapman and Hall/CRC, Boca Raton, FL. · Zbl 0963.90068
[2] Altman, E. and Shwartz, A. (1991). Markov decision problems and state-action frequencies. SIAM J. Control Optim. 29 786-809. · Zbl 0733.90075 · doi:10.1137/0329043
[3] Ash, R. B. (2000). Probability and Measure Theory , 2nd ed. Academic Press, Burlington, MA. · Zbl 0944.60004
[4] Bertsekas, D. P. and Shreve, A. (1996). Stochastic Optimal Control : The Case of Discrete-Time Case . Athena Scientific, Belmont, MA. · Zbl 0471.93002
[5] Chen, M.-F. (2004). From Markov Chains to Non-equilibrium Particle Systems , 2nd ed. World Scientific, River Edge, NJ. · Zbl 1078.60003
[6] Chen, R. C. and Feinberg, E. A. (2007). Non-randomized policies for constrained Markov decision processes. Math. Methods Oper. Res. 66 165-179. · Zbl 1126.90074 · doi:10.1007/s00186-006-0133-x
[7] Feinberg, E. A. (2000). Constrained discounted Markov decision processes and Hamiltonian cycles. Math. Oper. Res. 25 130-140. · Zbl 1073.90567 · doi:10.1287/moor.25.1.130.15210
[8] Feinberg, E. A. and Shwartz, A. (1995). Constrained Markov decision models with weighted discounted rewards. Math. Oper. Res. 20 302-320. · Zbl 0837.90120 · doi:10.1287/moor.20.2.302
[9] Feinberg, E. A. and Shwartz, A. (1996). Constrained discounted dynamic programming. Math. Oper. Res. 21 922-945. · Zbl 0867.90123 · doi:10.1287/moor.21.4.922
[10] Feinberg, E. A. and Shwartz, A. (1999). Constrained dynamic programming with two discount factors: Applications and an algorithm. IEEE Trans. Automat. Control 44 628-631. · Zbl 0957.90127 · doi:10.1109/9.751365
[11] Guo, X. P. (2007). Constrained optimization for average cost continuous-time Markov decision processes. IEEE Trans. Automat. Control 52 1139-1143. · Zbl 1366.90217 · doi:10.1109/TAC.2007.899040
[12] Guo, X. P. (2007). Continuous-time Markov decision processes with discounted rewards: The case of Polish spaces. Math. Oper. Res. 32 73-87. · Zbl 1278.90426 · doi:10.1287/moor.1060.0210
[13] Guo, X. P. and Hernández-Lerma, O. (2003). Constrained continuous-time Markov control processes with discounted criteria. Stochastic Anal. Appl. 21 379-399. · Zbl 1099.90071 · doi:10.1081/SAP-120019291
[14] Guo, X. P. and Hernández-Lerma, O. (2003). Continuous-time controlled Markov chains. Ann. Appl. Probab. 13 363-388. · Zbl 1049.60067 · doi:10.1214/aoap/1042765671
[15] Guo, X. P. and Hernández-Lerma, O. (2009). Continuous-time Markov Decision Processes : Theory and Applications. Stochastic Modelling and Applied Probability 62 . Springer, Berlin. · Zbl 1209.90002
[16] Guo, X. P. and Rieder, U. (2006). Average optimality for continuous-time Markov decision processes in Polish spaces. Ann. Appl. Probab. 16 730-756. · Zbl 1160.90010 · doi:10.1214/105051606000000105
[17] Guo, X. P., Hernández-Lerma, O. and Prieto-Rumeau, T. (2006). A survey of resent results on continuous-time Markov decision processes. TOP 14 177-246. · Zbl 1278.90427 · doi:10.1007/BF02837562
[18] Haviv, M. and Puterman, M. L. (1998). Bias optimality in controlled queueing systems. J. Appl. Probab. 35 136-150. · Zbl 0903.90176 · doi:10.1239/jap/1032192558
[19] Hernández-Lerma, O. and González-Hernández, J. (2000). Constrained Markov control processes in Borel spaces: The discounted case. Math. Methods Oper. Res. 52 271-285. · Zbl 1032.90061 · doi:10.1007/s001860000071
[20] Hernández-Lerma, O., González-Hernández, J. and López-Martínez, R. R. (2003). Constrained average cost Markov control processes in Borel spaces. SIAM J. Control Optim. 42 442-468 (electronic). · Zbl 1049.90116 · doi:10.1137/S0363012999361627
[21] Hernández-Lerma, O. and Lasserre, J. B. (1996). Discrete-time Markov Control Processes : Basic Optimality Criteria. Applications of Mathematics ( New York ) 30 . Springer, New York. · Zbl 0840.93001
[22] Hernández-Lerma, O. and Lasserre, J. B. (1999). Further Topics on Discrete-time Markov Control Processes. Applications of Mathematics ( New York ) 42 . Springer, New York. · Zbl 0928.93002
[23] Hordijk, A. and Spieksma, F. (1989). Constrained admission control to a queueing system. Adv. in Appl. Probab. 21 409-431. · Zbl 0674.60086 · doi:10.2307/1427167
[24] Jacod, J. (1975). Multivariate point processes: Predictable projection, Radon-Nikodým derivatives, representation of martingales. Z. Wahrsch. Verw. Gebiete 31 235-253. · Zbl 0302.60032 · doi:10.1007/BF00536010
[25] Kadota, Y., Kurano, M. and Yasuda, M. (2006). Discounted Markov decision processes with utility constraints. Comput. Math. Appl. 51 279-284. · Zbl 1120.90066 · doi:10.1016/j.camwa.2005.11.013
[26] Kakumanu, P. (1971). Continuously discounted Markov decision model with countable state and action space. Ann. Math. Statist. 42 919-926. · Zbl 0234.93027 · doi:10.1214/aoms/1177693321
[27] Kitaev, M. Y. (1985). Semi-Markov and Jump Markov controlled models: Average cost criterion. Theory Probab. Appl. 30 272-288. · Zbl 0586.90093 · doi:10.1137/1130036
[28] Kitaev, M. Y. and Rykov, V. V. (1995). Controlled Queueing Systems . CRC Press, Boca Raton, FL. · Zbl 0876.60077
[29] Kurano, M., Nakagami, J.-I. and Huang, Y. (2000). Constrained Markov decision processes with compact state and action spaces: The average case. Optimization 48 255-269. · Zbl 0961.93059 · doi:10.1080/02331930008844505
[30] Lewis, M. E. and Puterman, M. L. (2001). A probabilistic analysis of bias optimality in unichain Markov decision processes. IEEE Trans. Automat. Control 46 96-100. · Zbl 1017.90121 · doi:10.1109/9.898698
[31] Lund, R. B., Meyn, S. P. and Tweedie, R. L. (1996). Computable exponential convergence rates for stochastically ordered Markov processes. Ann. Appl. Probab. 6 218-237. · Zbl 0863.60093 · doi:10.1214/aoap/1034968072
[32] Phelps, R. R. (2001). Lectures on Choquet’s Theorem , 2nd ed. Lecture Notes in Math. 1757 . Springer, Berlin. · Zbl 0997.46005
[33] Piunovskiy, A. B. (1997). Optimal Control of Random Sequences in Problems with Constraints. Mathematics and Its Applications 410 . Kluwer, Dordrecht. · Zbl 0894.93001
[34] Piunovskiy, A. B. (1998). A controlled jump discounted model with constraints. Theory Probab. Appl. 42 51-72. · Zbl 0908.60026 · doi:10.4213/tvp1715
[35] Piunovskiy, A. B. (2005). Discounted Continuous Time Markov Decision Processes : The Convex Analytic Approach . 16th Triennial IFAC World Congress, Czech Republic, Praha.
[36] Prieto-Rumeau, T. and Hernández-Lerma, O. (2008). Ergodic control of continuous-time Markov chains with pathwise constraints. SIAM J. Control Optim. 47 1888-1908. · Zbl 1165.93040 · doi:10.1137/060668857
[37] Puterman, M. L. (1994). Markov Decision Processes : Discrete Stochastic Dynamic Programming . Wiley, New York. · Zbl 0829.90134
[38] Sennott, L. I. (1991). Constrained discounted Markov decision chains. Probab. Engrg. Inform. Sci. 5 463-475. · Zbl 1134.90531 · doi:10.1017/S0269964800002230
[39] Sennott, L. I. (1999). Stochastic Dynamic Programming and the Control of Queueing Systems . Wiley, New York. · Zbl 0997.93503
[40] Yushkevich, A. A. (1977). Controlled Markov models with contable states and continous time. Theory Probab. Appl. 22 215-7235. · Zbl 0379.93052 · doi:10.1137/1122029
[41] Zadorojniy, A. and Shwartz, A. (2006). Robustness of policies in constrained Markov decision processes. IEEE Trans. Automat. Control 51 635-638. · Zbl 1366.90219 · doi:10.1109/TAC.2006.872754
[42] Zhang, L. L. and Guo, X. P. (2008). Constrained continuous-time Markov decision processes with average criteria. Math. Methods Oper. Res. 67 323-340. · Zbl 1143.90033 · doi:10.1007/s00186-007-0154-0
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.