×

Blackwell optimality in Markov decision processes with partial observation. (English) Zbl 1103.90402

Summary: A Blackwell \(\epsilon\)-optimal strategy in a Markov decision process is a strategy that is \(\epsilon\)-optimal for every discount factor sufficiently close to 1. We prove the existence of Blackwell \(\epsilon\)-optimal strategies in finite Markov decision processes with partial observation.

MSC:

90C40 Markov and semi-Markov decision processes

References:

[1] ALTMAN, E. (2001). Applications of Markov decision processes in communication networks: a survey. In Handbook of Markov Decision Processes: Methods and Applications (E. Feinberg and A. Shwartz, eds.). Kluwer, Boston.
[2] ARAPOSTATHIS, A., BORKAR, V. S., FERNÁNDEZ-GAUCHERAND, E., GHOSH, M. K. and · Zbl 0984.34009 · doi:10.1006/jmaa.2000.7393
[3] MARCUS, S. I. (1993). Discrete-time controlled Markov processes with average cost criterion: a survey. SIAM J. Control Optim. 31 282-344. · Zbl 0770.93064 · doi:10.1137/0331018
[4] BLACKWELL, D. (1962). Discrete dy namic programming. Ann. Math. Statist. 33 719-726. · Zbl 0133.12906 · doi:10.1214/aoms/1177704593
[5] BORKAR, V. S. (1988). Control of Markov chains with long-run average cost criterion. In Stochastic Differential Sy stems, Stochastic Control Theory and Applications (W. Fleming and P. L. Lions, eds.) 57-77. Springer, Berlin. · Zbl 0663.93078
[6] BORKAR, V. S. (1991). Topics in Controlled Markov Chains. Longman, Essex. · Zbl 0725.93082
[7] FERNÁNDEZ-GAUCHERAND, E., ARAPOSTATHIS, A. and MARCUS, S. I. (1989). On partially observable Markov decision processes with an average cost criterion. In Proceedings of the 28th IEEE Conference on Decision and Control 1267-1272. IEEE Press, New York.
[8] KALLENBERG, L. (2001). Finite state and action MDPs. In Handbook of Markov Decision Processes: Methods and Applications (E. Feinberg and A. Shwartz, eds.) 21-30. Kluwer, Boston. · Zbl 1004.90063
[9] KUHN, H. W. (1953). Extensive games and the problem of information. In Contributions to the Theory of Games II (H. W. Kuhn and A. W. Tucker, eds.) 193-216. Princeton Univ. Press. · Zbl 0050.14303
[10] LANE, D. E. (1989). A partially observable model of decision making by fishermen. Oper. Res. 37 240-254. JSTOR: · Zbl 0666.90039 · doi:10.1287/opre.37.2.240
[11] LEHRER, E. and SORIN, S. (1992). A uniform Tauberian theorem in dy namic programming. Math. Oper. Res. 17 303-307. JSTOR: · Zbl 0771.90099 · doi:10.1287/moor.17.2.303
[12] MITRA, T., RAY, D. and ROY, R. (1991). The economics of orchards: an exercise in pointinput, flow-output capital theory. J. Econom. Theory 53 12-50. · Zbl 0733.90023 · doi:10.1016/0022-0531(91)90141-P
[13] MONAHAN, G. E. (1982). A survey of partially observable Markov decision processes: theory, models, and algorithms. Management Sci. 28 1-16. JSTOR: · Zbl 0486.90084 · doi:10.1287/mnsc.28.1.1
[14] PUTERMAN, M. L. (1994). Markov Decision Processes: Discrete Stochastic Dy namic Programming. Wiley, New York.
[15] RHENIUS, D. (1974). Incomplete information in Markovian decision models. Ann. Statist. 2 1327-1334. · Zbl 0294.49007 · doi:10.1214/aos/1176342886
[16] SAWARAGI, Y. and YOSHIKAWA, T. (1970). Discrete time Markovian decision processes with incomplete state observation. Ann. Math. Statist. 41 78-86. · Zbl 0188.48102 · doi:10.1214/aoms/1177697190
[17] SENNOTT, L. I. (1999). Stochastic Dy namic Programming and the Control of Queueing Sy stems. Wiley, New York. · Zbl 0997.93503
[18] YUSHKEVICH, A. A. (1976). Reduction of a controlled Markov model with incomplete date to a problem with complete information in the case of Borel state and control spaces. Theory Probab. Appl. 21 153-158. · Zbl 0357.93040 · doi:10.1137/1121014
[19] EVANSTON, ILLINOIS 60208 AND SCHOOL OF MATHEMATICAL SCIENCES TEL AVIV UNIVERSITY TEL AVIV 69978 ISRAEL E-MAIL: eilons@post.tau.ac.il N. VIEILLE ECOLE POLy TECHNIQUE AND DÉPARTEMENT FINANCE ET ECONOMIE HEC 1, RUE DE LA LIBÉRATION 78 351 JOUY-EN-JOSAS FRANCE E-MAIL: vieille@hec.fr
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.