×

The risk probability criterion for discounted continuous-time Markov decision processes. (English) Zbl 1386.93308

Summary: In this paper, we consider the risk probability minimization problem for infinite discounted continuous-time Markov decision processes (CTMDPs) with unbounded transition rates. First, we introduce a class of policies depending on histories with the additional reward levels. Then, we construct the corresponding probability spaces, and also establish the non-explosion of the state process. Secondly, under suitable conditions we prove that the value function is a solution to the optimality equation for the probability criterion by an iteration technique, and obtain a value iteration algorithm to compute (at least approximate) the value function. Furthermore, under an additional condition we establish the uniqueness of the solution to the optimality equation and the existence of an optimal policy. Finally, we illustrate our results with two examples. The first one is used to verify our conditions for CTMDPs with unbounded transition rates, the second one for the numerical calculation of the value function and an optimal policy.

MSC:

93E20 Optimal stochastic control
90C40 Markov and semi-Markov decision processes
49L20 Dynamic programming in optimal control and differential games
Full Text: DOI

References:

[1] Anderson WJ (1991) Continuous-time Markov chains. Springer · Zbl 0731.60067
[2] Baüerle N, Rieder U (2011) Markov decision processes with applications to finance. Springer · Zbl 1236.90004
[3] Bertsekas D, Shreve S (1996) Stochastic optimal control: the discrete-time case. Academic Press, Inc · Zbl 0471.93002
[4] Bouakiz M, kebir Y (1995) Target-level criterion in Markov decision process. J Optim Theory Appl 86:1-15 · Zbl 0838.90135 · doi:10.1007/BF02193458
[5] Chung KL (1967) Markov chains with stationary transition probabilities. Springer · Zbl 0146.38401
[6] Cao XR, Guo XP (2004) A unified approach to Markov decision problems and performance sensitivity analysis with discounted and average criteria: multichain cases. Automatica 40:1749-1759 · Zbl 1059.90141 · doi:10.1016/j.automatica.2004.05.003
[7] Cao XR (2007) Stochastic learning and optimization-a sensitivity-based approach. Springer · Zbl 1130.93057
[8] Cao XR, Wang DX, Lu T, Xu YF (2011) Stochastic control via direct comparison. Discrete Event Dyn Syst 21:11-38 · Zbl 1210.93081 · doi:10.1007/s10626-010-0093-4
[9] Feinberg E (2012) Reduction of discounted continuous-time MDPs with unbounded jump and reward rates to discrete-time total-reward MDPs. Optimization Control and Applications of Stochastic Systems Springer pp 77-97 · Zbl 1374.90402
[10] Guo XP (2007) Continuous-time Markov decision processes with discounted rewards: the case of Polish spaces. Math Oper Res 32:73-87 · Zbl 1278.90426 · doi:10.1287/moor.1060.0210
[11] Guo XP, Hernández-Lerma O (2009) Continuous-time Markov decision processes. Springer · Zbl 1209.90002
[12] Guo XP, Piunovskiy A (2011) Discounted continuous-time Markov decision processes with constraints: unbounded transition and loss rates. Math Oper Res 36:105-132 · Zbl 1218.90209 · doi:10.1287/moor.1100.0477
[13] Guo XP, Huang YH, Song XY (2012) Linear programming and constrained average optimality for general continuous-time Markov decision processes in history-dependent policies. SIAM J Control Optim 50:23-47 · Zbl 1250.90108 · doi:10.1137/100805169
[14] Guo XP, Huang XX, Huang YH (2015) Finite-horizon optimality for continuous-time Markov decision processs with unbounded transition rates. Adv Appl Prob 47(4):1064-1087 · Zbl 1330.90125 · doi:10.1017/S0001867800049016
[15] Huang YH, Guo XP, Song XY (2011) Performance analysis for controlled semi-Markov process. J Optim Theory Appl 150:395-415 · Zbl 1222.90076 · doi:10.1007/s10957-011-9813-7
[16] Huang YH, Guo XP, Li ZF (2013) Minimum risk probability for finite horizon semi-Markov decision process. J Math Anal Appl 402:378-391 · Zbl 1267.90169 · doi:10.1016/j.jmaa.2013.01.021
[17] Huang XX, Zou XL, Guo XP (2015) A minimization problem of the risk probability in first passage semi-Markov decision processes with loss rates. Sci China Math 58:1923-1938 · Zbl 1327.90367 · doi:10.1007/s11425-015-5029-x
[18] Hong LJ, Liu G (2009) Simulating sensitivities of conditional value at risk. Manag Sci 55:281-293 · Zbl 1232.91706 · doi:10.1287/mnsc.1080.0901
[19] Hernández-Lerma O, Lasserre JB (1996) Discrete-time Markov control processes: basic optimality criteria. Springer · Zbl 0840.93001
[20] Janssen J, Manca R (2006) Semi-Markov risk models for finance, insurance, and reliability. Springer Mathematics 319:24-37 · Zbl 1144.91027
[21] Li YJ, Cao F (2013) A basic formula for performance gradient estimation of semi-Markov decision processes. Eur J Oper Res 224:333-339 · Zbl 1292.90310 · doi:10.1016/j.ejor.2012.08.010
[22] Ohtsubo Y, Toyonaga K (2002) Optimal policy for minimizing risk models in Markov decision processes. J Math Anal Appl 271:66-81 · Zbl 1019.91012 · doi:10.1016/S0022-247X(02)00097-5
[23] Ohtsubo Y (2003) Minimizing risk models in stochastic shortest path problems. Mathe Meth Oper Res 57:79-88 · Zbl 1023.90078 · doi:10.1007/s001860200246
[24] Sakaguchi M, Ohtsubo Y (2013) Markov decision processes associated with two threshold probability criteria. J Control Theory Appl 11:548-557 · Zbl 1299.90366 · doi:10.1007/s11768-013-2194-8
[25] Prieto-Rumeau T, Hernández-Lerma O (2012) Discounted continuous-time controlled Markov chains: convergence of control models. J Appl Probab 49:1072-1090 · Zbl 1255.90126
[26] Piunovskiy A, Zhang Y (2011) Discounted continuous-time Markov decision processes with unbounded rates: the convex analytic approach. SIAM J Control Optim 49:2032-2061 · Zbl 1242.90283 · doi:10.1137/10081366X
[27] Puterman ML (1994) Markov decision processes: discrete stochastic dynamic programming. Wiley , New York · Zbl 0829.90134 · doi:10.1002/9780470316887
[28] Peng YJ, Fu M, Hu JQ (2016) Estimating distribution sensitivity using generalized likelihood ratio method. WODES, Xi’an, China · doi:10.1109/WODES.2016.7497836
[29] Sobel MJ (1982) The variance of discounted Markov decision processes. J Appl Probab 19:744-802 · Zbl 0503.90091 · doi:10.1017/S0021900200023123
[30] White DJ (1993) Minimizing a threshold probability in discounted Markov decision processes. J Math Anal Appl 173:634-646 · Zbl 0810.90134 · doi:10.1006/jmaa.1993.1093
[31] Wu CB, Lin YL (1999) Minimizing risk models in Markov decision processes with policies depending on target values. J Math Anal Appl 231:47-67 · Zbl 0917.90285 · doi:10.1006/jmaa.1998.6203
[32] Xi HS, Tang H, Yin BQ (2003) Optimal policies for a continuous time MCP with compact action set. Acta Automat Sinica 29:206-211 · Zbl 1498.93794
[33] Xia L, Jia QS (2015) Parameterized Markov decision process and its application to service rate control. Automatica 54:29-35 · Zbl 1318.93062 · doi:10.1016/j.automatica.2015.01.006
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.