×

A sensitivity formula for risk-sensitive cost and the actor-critic algorithm. (English) Zbl 0987.93080

Summary: We propose for the risk sensitive control of finite Markov chains a counterpart to the popular ‘actor-critic’ algorithm for classical Markov decision processes. The algorithm is based on a ‘sensitivity formula’ for the risk sensitive cost and is shown to converge with probability one to the desired solution. The proof technique is an adaptation of the ordinary differential equations approach for the analysis of two time-scale stochastic approximation algorithms.

MSC:

93E20 Optimal stochastic control
90C40 Markov and semi-Markov decision processes
62L20 Stochastic approximation
Full Text: DOI

References:

[1] Balaji, S.; Meyn, S. P., Multiplicative ergodicity and large deviations for an irreducible Markov chain, Stochastic Process. Appl., 90, 1, 123-144 (2000) · Zbl 1046.60065
[2] Barto, A.; Sutton, R.; Anderson, C., Neuron-like elements that can solve difficult learning control problems, IEEE Trans. Systems Man Cybernet., 13, 835-846 (1983)
[3] Benveniste, A.; Metivier, M.; Priouret, P., Adaptive Algorithms and Stochastic Approximation (1990), Springer: Springer New York · Zbl 0752.93073
[4] Bertsekas, D.; Tsitsiklis, J., Neurodynamic Programming (1996), Athena Scientific: Athena Scientific Belmont, MA · Zbl 0924.68163
[5] Bielecki, T. R.; Hernandez-Hernandez, D.; Pliska, S., Risk sensitive control of finite state Markov chains in discrete time, with applications to portfolio management, Math. Methods Oper. Res., 50, 167-188 (1999) · Zbl 0959.91029
[6] Borkar, V. S., Stochastic approximation with two time scales, Systems Control Lett., 29, 291-294 (1997) · Zbl 0895.62085
[7] V.S. Borkar, Asynchronous stochastic approximation, SIAM J. Control Optim. 36 (1998) 840-851 (Erratum: 38 (2000) 662-663).; V.S. Borkar, Asynchronous stochastic approximation, SIAM J. Control Optim. 36 (1998) 840-851 (Erratum: 38 (2000) 662-663). · Zbl 0922.62081
[8] V.S. Borkar, \(Q\); V.S. Borkar, \(Q\) · Zbl 1082.90576
[9] Borkar, V. S.; Meyn, S. P., The O.D.E. method for convergence of stochastic approximation and reinforcement learning, SIAM J. Control Optim., 38, 447-469 (2000) · Zbl 0990.62071
[10] V.S. Borkar, S.P. Meyn, Risk sensitive optimal control for Markov decision processes with monotone cost, Math. Oper. Res., submitted for publication.; V.S. Borkar, S.P. Meyn, Risk sensitive optimal control for Markov decision processes with monotone cost, Math. Oper. Res., submitted for publication. · Zbl 1082.90577
[11] Cao, X. R.; Chen, H. F., Perturbation realization, potentials and sensitivity analysis of Markov processes, IEEE Trans. Automat. Control, 42, 1382-1393 (1997) · Zbl 0889.93039
[12] Dai Pra, P.; Meneghini, L.; Runggaldier, W. J., Connections between stochastic control and dynamic games, Math. Control Signals Systems, 9, 303-326 (1996) · Zbl 0874.93096
[13] Di Masi, G. B.; Stettner, L., Risk-sensitive control of discrete time Markov processes with infinite horizon, SIAM J. Control Optim., 38, 61-78 (1999) · Zbl 0946.93043
[14] Fleming, W. H.; Hernandez-Hernandez, D., Risk-sensitive control of finite state machines on an infinite horizon I, SIAM J. Control Optim., 35, 1790-1810 (1997) · Zbl 0891.93085
[15] P.W. Glynn, Stochastic approximation for Monte Carlo optimization, Proceedings of the 1986 Winter Simulation Conference, 1986, pp. 285-289.; P.W. Glynn, Stochastic approximation for Monte Carlo optimization, Proceedings of the 1986 Winter Simulation Conference, 1986, pp. 285-289.
[16] D. Hernandez-Hernandez, S.I. Marcus, Risk sensitive control of Markov processes in countable state space, Systems Control Lett. 29 (1998) 147-155 (Corrigendum: 34 (1998) 105-106).; D. Hernandez-Hernandez, S.I. Marcus, Risk sensitive control of Markov processes in countable state space, Systems Control Lett. 29 (1998) 147-155 (Corrigendum: 34 (1998) 105-106). · Zbl 0866.93101
[17] T. Jaakola, S.P. Singh, M. Jordan, Reinforcement learning algorithms for partially observable Markov decision problems, in: G. Tesauro, D. Touretzky, T. Leen (Eds.), Advances in Neural Processing Systems, Vol. 7, Morgan Kaufmann, San Francisco, 1995, pp. 345-352.; T. Jaakola, S.P. Singh, M. Jordan, Reinforcement learning algorithms for partially observable Markov decision problems, in: G. Tesauro, D. Touretzky, T. Leen (Eds.), Advances in Neural Processing Systems, Vol. 7, Morgan Kaufmann, San Francisco, 1995, pp. 345-352.
[18] Konda, V. R.; Borkar, V. S., Actor-critic type learning algorithms for Markov decision processes, SIAM J. Control Optim., 38, 94-123 (1999) · Zbl 0938.93069
[19] V.R. Konda, J. Tsitsiklis, Actor-critic algorithms, SIAM J. Control Optim. submitted for publication.; V.R. Konda, J. Tsitsiklis, Actor-critic algorithms, SIAM J. Control Optim. submitted for publication. · Zbl 1049.93095
[20] Kushner, H. J.; Yin, G. J., Stochastic Approximation Algorithms and Applications (1997), Springer: Springer New York · Zbl 0914.60006
[21] Ljung, L., Analysis of recursive stochastic algorithms, IEEE Trans. Automat. Control, 22, 551-575 (1977) · Zbl 0362.93031
[22] Marbach, P.; Tsitsiklis, J., Simulation-based optimization of Markov reward processes, IEEE Trans. Automat. Control, 46, 191-209 (1998) · Zbl 0992.93088
[23] Puterman, M. I., Markov Decision Processes (1994), Wiley: Wiley New York · Zbl 0829.90134
[24] Sutton, R. S.; Barto, A., Reinforcement Learning (1998), MIT Press: MIT Press Cambridge, MA
[25] Whittle, P., Risk-Sensitive Optimal Control (1990), Wiley: Wiley New York · Zbl 0718.93068
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.