×

Constrained Markov decision processes with first passage criteria. (English) Zbl 1271.90104

Summary: This paper deals with constrained Markov decision processes (MDPs) with first passage criteria. The objective is to maximize the expected reward obtained during a first passage time to some target set, and a constraint is imposed on the associated expected cost over this first passage time. The state space is denumerable, and the rewards/costs are possibly unbounded. In addition, the discount factor is state-action dependent and is allowed to be equal to one. We develop suitable conditions for the existence of a constrained optimal policy, which are generalizations of those for constrained MDPs with the standard discount criteria. Moreover, it is revealed that the constrained optimal policy randomizes between two stationary policies differing in at most one state. Finally, we use a controlled queueing system to illustrate our results, which exhibits some advantage of our optimality conditions.

MSC:

90C40 Markov and semi-Markov decision processes
Full Text: DOI

References:

[1] Alvarez-Mena, J., & Hernández-Lerma, O. (2002). Convergence of the optimal values of constrained Markov control processes. Mathematical Methods of Operations Research, 55, 461–484. · Zbl 1031.90058 · doi:10.1007/s001860200209
[2] Berument, H., Kilinc, Z., & Ozlale, U. (2004). The effects of different inflation risk premiums on interest rate spreads. Physica. A, 333, 317–324. · doi:10.1016/j.physa.2003.10.039
[3] Beutler, F. J., & Ross, K. W. (1985). Optimal policies for controlled Markov chains with a constraint. Journal of Mathematical Analysis and Applications, 112, 236–252. · Zbl 0581.93067 · doi:10.1016/0022-247X(85)90288-4
[4] Bhatnagar, S. (2010). An actor-critic algorithm with function approximation for discounted cost constrained Markov decision processes. Systems & Control Letters, 59, 760–766. · Zbl 1209.90344 · doi:10.1016/j.sysconle.2010.08.013
[5] Boda, K., Filar, J. A., Lin, Y., & Spanjers, L. (2004). Stochastic target hitting time and the problem of early retirement. IEEE Transactions on Automatic Control, 49, 409–419. · Zbl 1366.91136 · doi:10.1109/TAC.2004.824469
[6] Derman, C. (1970). Mathematics in science and engineering: Vol. 67. Finite state Markovian decision processes. New York: Academic Press. · Zbl 0262.90001
[7] Guo, X. P. (2000). Constrained denumerable state non-stationary MDPs with expected total reward criterion. Acta Mathematicae Applicatae Sinica, 16, 205–212. · Zbl 0971.90102 · doi:10.1007/BF02677681
[8] Guo, X. P., & Hernández-Lerma, O. (2003). Constrained continuous-time Markov control processes with discounted criteria. Stochastic Analysis and Applications, 21, 379–399. · Zbl 1099.90071 · doi:10.1081/SAP-120019291
[9] Guo, X. P., & Hernández-Lerma, O. (2009). Continuous-time Markov decision processes: theory and applications. Berlin Heidelberg: Springer. · Zbl 1209.90002
[10] Haberman, S., & Sung, J. (2005). Optimal pension funding dynamics over infinite control horizon when stochastic rates of return are stationary. Insurance. Mathematics & Economics, 36, 103–116. · Zbl 1111.91023 · doi:10.1016/j.insmatheco.2004.10.006
[11] Hernández-Lerma, O., & Lasserre, J. B. (1996). Discrete-time Markov control processes: basic optimality criteria. New York: Springer. · Zbl 0840.93001
[12] Hernández-Lerma, O., & Lasserre, J. B. (1999). Further topics on discrete-time Markov control processes. New York: Springer. · Zbl 0928.93002
[13] Hernández-Lerma, O., & González-Hernández, J. (2000). Constrained Markov control processes in Borel spaces: the discounted case. Mathematical Methods of Operations Research, 52, 271–285. · Zbl 1032.90061 · doi:10.1007/s001860000071
[14] Hernández-Lerma, O., González-Hernández, J., & López-Martínez, R. R. (2003). Constrained average cost Markov control processes in Borel spaces. SIAM Journal on Control and Optimization, 42, 442–468. · Zbl 1049.90116 · doi:10.1137/S0363012999361627
[15] Huang, Y. H., & Guo, X. P. (2009). Optimal risk probability for first passage models in semi-Markov decision processes. Journal of Mathematical Analysis and Applications, 359, 404–420. · Zbl 1176.90625 · doi:10.1016/j.jmaa.2009.05.058
[16] Huang, Y. H., & Guo, X. P. (2011). First passage models for denumerable semi-Markov decision processes with nonnegative discounted costs. Acta Mathematicae Applicatae Sinica, 27, 177–190. · Zbl 1235.90177 · doi:10.1007/s10255-011-0061-2
[17] Kushner, H. (1971). Introduction to stochastic control. New York: Holt, Rinehart & Winston · Zbl 0293.93018
[18] Kurano, M., Nakagami, J.-I., & Huang, Y. (2000). Constrained Markov decision processes with compact state and action spaces: the average case. Optimization, 48, 255–269. · Zbl 0961.93059 · doi:10.1080/02331930008844505
[19] Lee, P., & Rosenfield, D. B. (2005). When to refinance a mortgage: a dynamic programming approach. European Journal of Operational Research, 166, 266–277. · Zbl 1066.91022 · doi:10.1016/j.ejor.2004.02.017
[20] Liu, J. Y., & Huang, S. M. (2001). Markov decision processes with distribution function criterion of first-passage time. Applied Mathematics & Optimization, 43, 187–201. · Zbl 1014.90110 · doi:10.1007/s00245-001-0007-9
[21] Liu, J. Y., & Liu, K. (1992). Markov decision programming-the first passage model with denumerable state space. Systems Science and Mathematics Sciences, 5, 340–351. · Zbl 0795.90082
[22] Mendoza-Pérez, A. F., & Hernández-Lerma, O. (2012). Deterministic optimal policies for Markov control processes with pathwise constraints. Applicationes Mathematicae (Warsaw), 39, 185–209. · Zbl 1242.93145 · doi:10.4064/am39-2-6
[23] Newell, R. G., & Pizer, W. A. (2003). Discounting the distant future: how much do uncertain rates increase valuation. Journal of Environmental Economics and Management, 46, 52–71. · Zbl 1041.91502 · doi:10.1016/S0095-0696(02)00031-1
[24] Puterman, M. L. (1994). Markov decision processes: discrete stochastic dynamic programming. New York: Wiley · Zbl 0829.90134
[25] Sack, B., & Wieland, V. (2000). Interest-rate smoothing and optimal monetary policy: a review of recent empirical evidence. Journal of Economics and Business, 52, 205–228. · doi:10.1016/S0148-6195(99)00030-2
[26] Schmidli, H. (2008). Stochastic control in insurance, probability and its applications. London: Springer. · Zbl 1133.93002
[27] Sennott, L. I. (1991). Constrained discounted Markov decision chains. Probability in the Engineering and Informational Sciences, 5, 463–475. · Zbl 1134.90531 · doi:10.1017/S0269964800002230
[28] Tanaka, K. (1991). On discounted dynamic programming with constraints. Journal of Mathematical Analysis and Applications, 155, 264–277. · Zbl 0729.90087 · doi:10.1016/0022-247X(91)90037-Z
[29] Yu, S. X., Lin, Y. L., & Yan, P. F. (1998). Optimization models for the first arrival target distribution function in discrete time. Journal of Mathematical Analysis and Applications, 225, 193–223. · Zbl 0924.90133 · doi:10.1006/jmaa.1998.6015
[30] Zhang, L. L., & Guo, X. P. (2008). Constrained continuous-time Markov control processes with average criteria. Mathematical Methods of Operations Research, 67, 323–340. · Zbl 1143.90033 · doi:10.1007/s00186-007-0154-0
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.