Document Zbl 0995.93075

On terminating Markov decision processes with a risk-averse objective function. (English) Zbl 0995.93075

Automatica 37, No. 9, 1379-1386 (2001).

This paper deals with terminating risk-sensitive finite states Markov decision processes with an absorbing and cost-free extra state. So the terminating problem is to seek stochastic shortest paths. Introducing two dynamic programming operators, the author gives the following results. (i) The existence and characterization of an optimal policy. (ii) Convergence properties for value iteration and policy iteration. Moreover, he illustrates the results with two computational examples.

Reviewer: M.Nisio (Osaka)

Cited in 8 Documents

MSC:

93E20	Optimal stochastic control
90C40	Markov and semi-Markov decision processes
49L20	Dynamic programming in optimal control and differential games
49J55	Existence of optimal solutions to problems involving randomness

Keywords:

risk-sensitive finite states Markov decision processes; terminating problem; stochastic shortest paths; dynamic programming; convergence; value iteration; policy iteration

Cite Review PDF

Full Text: DOI

References:

[1]	Bertsekas, D. P.; Tsitsiklis, J. N., Parallel and distributed computation: Numerical methods (1989), Prentice-Hall: Prentice-Hall Englewood Cliffs, NJ · Zbl 0743.65107
[2]	Bertsekas, D. P.; Tsitsiklis, J. N., Analysis of stochastic shortest path problems, Mathematics of Operations Research, 16, 3, 580-595 (1991) · Zbl 0751.90077
[3]	Chung, K.-J.; Sobel, M. J., Discounted MDPs: Distribution functions and exponential utility maximization, SIAM Journal on Control and Optimization, 25, 49-62 (1987) · Zbl 0617.90085
[4]	Coraluppi, S. P.; Marcus, S. I., Risk-sensitive and minimax control of discrete-time, finite-state Markov decision processes, Automatica, 35, 301-309 (1999) · Zbl 0936.93052
[5]	Denardo, E. V.; Rothblum, U. G., Optimal stopping, exponential utility, and linear programming, Mathematical Programming, 16, 2, 228-244 (1979) · Zbl 0401.90101
[6]	Fleming, W. H., & McEneaney, W. M. (1992). Risk-sensitive optimal control and differential games. In: T. E. Duncan and B. Pask-Duncan (Eds.), Proceedings of the stochastic theory and adaptive controls workshop; Fleming, W. H., & McEneaney, W. M. (1992). Risk-sensitive optimal control and differential games. In: T. E. Duncan and B. Pask-Duncan (Eds.), Proceedings of the stochastic theory and adaptive controls workshop · Zbl 0788.90097
[7]	Glover, K.; Doyle, J. C., State-space formulae for all stabilizing controllers that satisfy an \(H_∞\)-norm bound and relations to risk-sensitivity, Systems and Control Letters, 11, 167-172 (1988) · Zbl 0671.93029
[8]	Hernandez-Hernandez, D.; Marcus, S. I., Risk-sensitive control of Markov processes in countable state space, Systems and Control Letters, 29, 147-155 (1996) · Zbl 0866.93101
[9]	Howard, R. S.; Matheson, J. E., Risk-sensitive Markov decision processes, Management Sciences, 8, 356-369 (1972) · Zbl 0238.90007
[10]	Jacobson, D. H., Optimal stochastic linear systems with exponential performance criteria and their relation to deterministic differential games, IEEE Transactions on Automatic Control, AC-18, 124-131 (1973) · Zbl 0274.93067
[11]	Jaquette, S. C. (1975). Utility optimal policies in an undiscounted Markov decision process; Jaquette, S. C. (1975). Utility optimal policies in an undiscounted Markov decision process
[12]	Marcus, S. I., Fernandez-Gaucherand, E., Hernandez-Hernandez, D., Coraluppi, S., & Fard, P. (1997). Risk sensitive Markov decision processes. In C. I. Byrnes et al. (Eds.), Systems and control in the twenty-first century.; Marcus, S. I., Fernandez-Gaucherand, E., Hernandez-Hernandez, D., Coraluppi, S., & Fard, P. (1997). Risk sensitive Markov decision processes. In C. I. Byrnes et al. (Eds.), Systems and control in the twenty-first century. · Zbl 1065.90543
[13]	Patek, S. D.; Bertsekas, D. P., Stochastic shortest path games, SIAM Journal on Control and Optimization, 37, 3, 804-824 (1999) · Zbl 0918.90148
[14]	Rothblum, U. G. (1974). Multiplicative Markov decision chains; Rothblum, U. G. (1974). Multiplicative Markov decision chains · Zbl 0535.90097
[15]	Runolfsson, T., The equivalence between infinite-horizon optimal control of stochastic systems with exponential-of-integral performance index and stochastic differential games, IEEE Transactions on Automatic Control, 39, 8, 1551-1563 (1994) · Zbl 0930.93084
[16]	Whittle, P., Risk-sensitive linear/quadratic/Gaussian control, Advances in Applied Probability, 13, 764-777 (1981) · Zbl 0489.93067
[17]	Whittle, P. (1990). Risk-sensitive optimal control; Whittle, P. (1990). Risk-sensitive optimal control · Zbl 0718.93068
[18]	Whittle, P. (1996a). Optimal controlBasics and beyond; Whittle, P. (1996a). Optimal controlBasics and beyond · Zbl 0880.49001
[19]	Whittle, P. (1996b). Why discount? The rationale of discounting in optimisation problems. In C. C. Heyde et al. (Eds.), Athens conference on applied probability and time series: Vol. 1. Applied probability. Lecture Notes in Statistics, Vol. 114. Berlin: Springer.; Whittle, P. (1996b). Why discount? The rationale of discounting in optimisation problems. In C. C. Heyde et al. (Eds.), Athens conference on applied probability and time series: Vol. 1. Applied probability. Lecture Notes in Statistics, Vol. 114. Berlin: Springer. · Zbl 0858.90012

This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.