×

The discounted method and equivalence of average criteria for risk-sensitive Markov decision processes on Borel spaces. (English) Zbl 1196.60127

The work concerns discrete-time Markov decision processes evolving on a Borel space. The system is driven by a risk-averse decision maker with a constant risk sensitivity coefficient \(\lambda > 0\), and the performance of a control policy is measured by the (superior limit) risk-sensitive average cost criterion associated with a nonnegative cost function. Under mild (semi-) continuity and compactness conditions, the following problems are studied via the discounted approach: (i) establishing the existence of optimal stationary policies, and (ii) determination of conditions under which the equality of the optimal value functions associated with the inferior limit and superior limit average criteria can be ensured. The approach of the paper relies on standard dynamic programming ideas and on a simple analytical derivation of a Tauberian relation.

MSC:

60J05 Discrete-time Markov processes on general state spaces
90C40 Markov and semi-Markov decision processes
Full Text: DOI

References:

[1] Arapostathis, A., Borkar, V.K., Fernández-Gaucherand, E., Gosh, M.K., Marcus, S.I.: Discrete-time controlled Markov processes with average cost criteria: a survey. SIAM J. Control Optim. 31, 282–334 (1993) · Zbl 0770.93064 · doi:10.1137/0331018
[2] Baras, J.S., James, M.R.: Robust and risk sensitive output feedback control for finite state machines and hidden Markov models. J. Math. Syst., Estim. Control 7, 371–374 (1997) · Zbl 0911.93055
[3] Cavazos-Cadena, R., Fernández-Gaucherand, E.: Controlled Markov chains with risk-sensitive criteria: average cost, optimality equations and optimal solutions. Math. Methods Oper. Res. 43, 121–139 (1999) · Zbl 0953.93077
[4] Cavazos-Cadena, R.: Solution to the risk-sensitive average cost optimality equation in a class of Markov decision processes with Finite state space. Math. Methods Oper. Res. 57, 263–285 (2003) · Zbl 1023.90076 · doi:10.1007/s001860200256
[5] Di Masi, G.B., Stettner, L.: Infinite horizon risk sensitive control of discrete time Markov processes with small risk. Syst. Control Lett. 40, 15–200 (2000) · Zbl 0977.93083 · doi:10.1016/S0167-6911(99)00118-8
[6] Dupuis, P., Ellis, R.S.: Weak Convergence Approach to the Theory of Large Deviations. Wiley, New York (1997) · Zbl 0904.60001
[7] Filar, J., Vrieze, K.: Competitive Markov Decision Processes. Springer, New York (1997) · Zbl 0934.91002
[8] Fleming, W.H., Hernández-Hernández, D.: Risk sensitive control of finite state machines on an infinite horizon I. SIAM J. Control Optim. 35, 1790–1810 (1997) · Zbl 0891.93085 · doi:10.1137/S0363012995291622
[9] Fleming, W.H., McEneany, W.M.: Risk-sensitive control on an infinite horizon. SIAM J. Control Optim. 33, 1881–1915 (1995) · Zbl 0949.93079 · doi:10.1137/S0363012993258720
[10] Hernández-Lerma, O.: Adaptive Markov Control Processes. Springer, New York (1988) · Zbl 0646.90090
[11] Hernández-Lerma, O., Lasserre, J.B.: Discrete-Time Markov Control Processes: Basic Optimality Criteria. Springer, New York (1996) · Zbl 0840.93001
[12] Hernández-Hernández, D., Marcus, S.I.: Existence of risk-sensitive optimal stationary policies for controlled Markov processes. Appl. Math. Optim. 40, 273–285 (1999) · Zbl 0937.90115 · doi:10.1007/s002459900126
[13] Himmelberg, C.J., Partasarathy, T., Van Vleck, F.S.: Optimal plans for dynamic programming problems. Math. Oper. Res. 1, 390–394 (1976) · Zbl 0368.90134 · doi:10.1287/moor.1.4.390
[14] Howard, A.R., Matheson, J.E.: Risk-sensitive Markov decision processes. Manag. Sci. 18, 356–369 (1972) · Zbl 0238.90007 · doi:10.1287/mnsc.18.7.356
[15] Jaśkiewicz, A.: Average optimality for risk sensitive control with general state space. Ann. Appl. Probab. 17, 654–675 (2007) · Zbl 1128.93056 · doi:10.1214/105051606000000790
[16] Jaśkiewicz, A., Nowak, A.S.: Zero-sum ergodic stochastic games with Feller transition probabilities. SIAM J. Control Optim. 316, 495–509 (2006) · Zbl 1148.90015
[17] Klein, E., Thompson, A.C.: Theory of Correspondences. Wiley, New York (1984) · Zbl 0556.28012
[18] Montes-de-Oca, R., Hernández-Lerma, O.: Conditions for average optimality in Markov control processes with unbounded costs and controls. J. Math. Syst., Estim. Control 4, 1–19 (1994) · Zbl 0812.93077
[19] Puterman, M.L.: Markov Decision Processes. Wiley, New York (1994)
[20] Schäl, M.: Conditions for optimality and for the limit of n-stage optimal policies to be optimal. Z. Wahrscheinlichkeitstheor. Verw. Geb. 32, 179–196 (1975) · Zbl 0316.90080 · doi:10.1007/BF00532612
[21] Sennott, L.I.: A new condition for the existence of optimum stationary policies in average cost Markov decision processes. Oper. Res. Lett. 5, 17–23 (1986) · Zbl 0593.90083 · doi:10.1016/0167-6377(86)90095-7
[22] Sennott, L.I.: The average cost optimality equation and critical number policies. Probab. Eng. Inf. Sci. 7, 47–67 (1993) · doi:10.1017/S0269964800002783
[23] Sennott, L.I.: Another set of conditions for average optimality in Markov control processes. Syst. Control Lett. 24, 147–151 (1995) · Zbl 0877.93135 · doi:10.1016/0167-6911(93)E0158-D
[24] Serfozo, R.: Convergence of Lebesgue integrals with varying measures. Sankhyā, Ser. A 44, 380–402 (1982) · Zbl 0568.28005
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.