×

Absorbing continuous-time Markov decision processes with total cost criteria. (English) Zbl 1282.90229

Authors’ abstract: In this paper we study absorbing continuous-time Markov decision processes in Polish state spaces with unbounded transition and cost rates, and history-dependent policies. The performance measures is the expected total undiscounted costs. For the unconstrained problem, we show the existence of a deterministic stationary optimal policy, whereas, for the constrained problems with \(N\) constraints, we show the existence of a mixed stationary optimal policy, where the mixture is over no more than \(N+1\) deterministic stationary policies. Furthermore, the strong duality result is obtained for the associated linear programs.

MSC:

90C40 Markov and semi-Markov decision processes
60J25 Continuous-time Markov processes on general state spaces
60J75 Jump processes (MSC2010)

References:

[1] Aliprantis, C. and Border, K. (2007). Infinite Dimensional Analysis . Springer, New York. · Zbl 1156.46001
[2] Altman, E. (1999). Constrained Markov Decision Processes . Chapman and Hall/CRC, Boca Raton. · Zbl 0963.90068
[3] Bertsekas, D. P. and Shreve, S. E. (1978). Stochastic Optimal Control . Academic Press, New York. · Zbl 0471.93002
[4] Bertsekas, D., Nedíc, A. and Ozdaglar, A. (2003). Convex Analysis and Optimization . Athena Scientific, Belmont, MA. · Zbl 1140.90001
[5] Bogachev, V. I. (2007). Measure Theory , Vol. I. Springer, Berlin. · Zbl 1120.28001
[6] Bogachev, V. I. (2007). Measure Theory , Vol. II. Springer, Berlin. · Zbl 1120.28001
[7] Clancy, D. and Piunovskiy, A. B. (2005). An explicit optimal isolation policy for a determinisitc epidemic model. Appl. Math. Comput. 163 , 1109-1121. · Zbl 1061.92053 · doi:10.1016/j.amc.2004.06.028
[8] Dubins, L. E. (1962). On extreme points of convex sets. J. Math. Anal. Appl. 5 , 237-244. · Zbl 0124.37704 · doi:10.1016/S0022-247X(62)80007-9
[9] Feinberg, E. A. and Fei, J. (2009). An inequality for variances of the discounted rewards. J. Appl. Prob. 46 , 1209-1212. · Zbl 1187.60030 · doi:10.1239/jap/1261670699
[10] Feinberg, E. A. and Rothblum, U. G. (2012). Splitting randomized stationary policies in total-reward Markov decision processes. Math. Operat. Res. 37 , 129-153. · Zbl 1243.90233 · doi:10.1287/moor.1110.0525
[11] Gleissner, W. (1988). The spread of epidemics. Appl. Math. Comput. 27 , 167-171. · Zbl 0647.92012 · doi:10.1016/0096-3003(88)90027-6
[12] Guo, X. (2007). Constrained optimization for average cost continuous-time Markov decision processes. IEEE Trans. Automatic Control 52 , 1139-1143. · Zbl 1366.90217 · doi:10.1109/TAC.2007.899040
[13] Guo, X. and Hernández-Lerma, O. (2009). Continuous-t ime Markov Decision Processes. Springer, Berlin.
[14] Guo, X. and Rieder, U. (2006). Average optimality for continuous-time Markov decision processes in Polish spaces. Ann. Appl. Prob. 16 , 730-756. · Zbl 1160.90010 · doi:10.1214/105051606000000105
[15] Guo, X. and Song, X. (2011). Discounted continuous-time constrained Markov decision processes in Polish spaces. Ann. Appl. Prob. 21 , 2016-2049. · Zbl 1258.90104 · doi:10.1214/10-AAP749
[16] Guo, X. and Zhang, L. (2011). Total reward criteria for unconstrained/constrained continuous-time Markov decision processes. J. Systems Sci. Complex. 24 , 491-505. · Zbl 1254.90290
[17] Guo, X., Huang, Y. and Song, X. (2012). Linear programming and constrained average optimality for general continuous-time Markov decision processes in history-dependent policies. SIAM J. Control Optimization 50 , 23-47. · Zbl 1250.90108 · doi:10.1137/100805169
[18] Hernández-Lerma, O. and Lasserre, J. B. (1996). Discrete-time Markov Control Processes . Springer, New York. · Zbl 0853.93106
[19] Hernández-Lerma, O. and Lasserre, J. B. (1999). Further Topics on Discrete-Time Markov Control Processes . Springer, New York. · Zbl 0928.93002
[20] Hernández-Lerma, O. and Lasserre, J. B. (2000). Fatou’s lemma and Lebesgue’s convergence theorem for measures. J. Appl. Math. Stoch. Anal. 13 , 137-146. · Zbl 0961.28002 · doi:10.1155/S1048953300000150
[21] Himmelberg, C. J. (1975). Measurable relations. Fund. Math. 87 , 53-72. · Zbl 0296.28003
[22] Himmelberg, C. J., Parthasarathy, T. and Van Vleck, F. S. (1976). Optimal plans for dynamic programming problems. Math. Operat. Res. 1 , 390-394. · Zbl 0368.90134 · doi:10.1287/moor.1.4.390
[23] Jacod, J. (1975). Multivariate point processes: predictable projection, Radon-Nykodym derivatives, representation of martingales. Z. Wahrscheinlichkeitsth. 31 , 235-253. · Zbl 0302.60032 · doi:10.1007/BF00536010
[24] Kitaev, M. (1986). Semi-Markov and jump Markov controlled models: average cost criterion. Theory. Prob. Appl. 30 , 272-288. · Zbl 0586.90093 · doi:10.1137/1130036
[25] Kitaev, M. and Rykov, V. V. (1995). Controlled Queueing Systems . CRC Press, Boca Raton, FL. · Zbl 0876.60077
[26] Piunovskiy, A. B. (1997). Optimal Control of Random Sequences in Problems with Constraints . Kluwer, Dordrecht. · Zbl 0894.93001
[27] Piunovskiy, A. B. (1998). A controlled jump discounted model with constraints. Theory Prob. Appl. 42 , 51-71. · Zbl 0908.60026 · doi:10.4213/tvp1715
[28] Piunovskiy, A. B. (2004). Optimal interventions in countable jump Markov processes. Math. Operat. Res. 29 , 289-308. · Zbl 1082.60074 · doi:10.1287/moor.1030.0063
[29] Piunovskiy, A. and Zhang, Y. (2011). Accuracy of fluid approximation to controlled birth-and-death processes: absorbing case. Math. Meth. Operat. Res. 73 , 159-187. · Zbl 1216.93110 · doi:10.1007/s00186-010-0340-3
[30] Piunovskiy, A. and Zhang, Y. (2011). Discounted continuous-time Markov decision processes with unbounded rates: the dynamic programming approach. Preprint. Available at http://arxiv.org/abs/1103.0134v1. · Zbl 1242.90283 · doi:10.1137/10081366X
[31] Piunovskiy, A. and Zhang, Y. (2011). Discounted continuous-time Markov decision processes with unbounded rates: the convex analytic approach. SIAM J. Control Optimization 49 , 2032-2061. · Zbl 1242.90283 · doi:10.1137/10081366X
[32] Piunovskiy, A. and Zhang, Y. (2012). The transformation method for continuous-time Markov decision processes. J. Optimization Theory Appl. 154 , 691-712. · Zbl 1256.90048 · doi:10.1007/s10957-012-0015-8
[33] Pliska, S. R. (1975). Controlled jump processes. Stoch. Process Appl. 3 , 259-282. · Zbl 0313.60055 · doi:10.1016/0304-4149(75)90025-3
[34] Prieto-Rumeau, T. and Hernández-Lerma, O. (2008). Ergodic control of continuous-time Markov chains with pathwise constraints. SIAM J. Control Optimization 47 , 1888-1908. · Zbl 1165.93040 · doi:10.1137/060668857
[35] Rockafellar, R. T. (1974). Conjugate Duality and Optimization . SIAM, Philadelphia, PA. · Zbl 0296.90036
[36] Varadarajan, V. S. (1958). Weak convergence of measures on separable metric spaces. Sankhyā 19 , 15-22. · Zbl 0082.26505
[37] Yeh, J. (2006). Real analysis: Theory of Measure and Integration , 2nd edn. World Scientific, Hackensack, NJ. · Zbl 1098.28002
[38] Zhang, Y. (2011). Convex analytic approach to constrained discounted Markov decision processes with non-constant discount factor. TOP , 31pp.
[39] Zhu, Q. (2008). Average optimality for continuous-time jump Markov decision processes with a policy iteration approach. J. Math. Anal. Appl. 339 , 691-704. · Zbl 1156.90023 · doi:10.1016/j.jmaa.2007.06.071
[40] Zhu, Q. and Prieto-Rumeau, T. (2008). Bias and overtaking optimality for continuous-time jump Markov decision processes in Polish spaces. J. Appl. Prob. 45 , 417-429. · Zbl 1189.90187 · doi:10.1239/jap/1214950357
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.