×

A unified approach to time-aggregated Markov decision processes. (English) Zbl 1335.93149

Summary: This paper presents a unified approach to time-aggregated Markov Decision Processes (MDPs) with an average cost criterion. The approach is based on a framework in which a time-aggregated MDP constitutes a Semi-Markov Decision Process (SMDP). By analyzing the performance sensitivity formulas of this SMDP, a number of optimization algorithms for time aggregated MDPs, including those previously reported in the literature, can be developed in a simple and intuitive way.

MSC:

93E25 Computational methods in stochastic control (MSC2010)
90C40 Markov and semi-Markov decision processes
93E20 Optimal stochastic control
Full Text: DOI

References:

[1] Arruda, E. F.; Fragoso, M. D., Time aggregated Markov decision processes via standard dynamic programming, Operations Research Letters, 39, 193-197 (2011) · Zbl 1219.90181
[2] Barto, A. G.; Mahadevan, S., Recent advances in hierarchical reinforcement learning, Discrete Event Dynamic Systems: Theory and Applications, 13, 343-379 (2003) · Zbl 1034.93003
[3] Bertsekas, D. P., A new value iteration method for the average cost dynamic programming problem, SIAM Journal on Control and Optimization, 36, 742-759 (1998) · Zbl 0909.90269
[4] Boyd, S.; Vandenberghe, L., Convex optimization (2004), Cambridge University Press · Zbl 1058.90049
[5] Cao, X. R., Semi-Markov decision problems and performance sensitivity analysis, IEEE Transactions on Automatic Control, 48, 758-769 (2003) · Zbl 1364.90344
[6] Cao, X. R., Stochastic learning and optimization (2007), Springer: Springer New York · Zbl 1130.93057
[7] Cao, X. R.; Chen, H. F., Perturbation realization, potentials, and sensitivity analysis of Markov processes, IEEE Transactions on Automatic Control, 42, 1382-1393 (1997) · Zbl 0889.93039
[8] Cao, X. R.; Ren, Z. Y.; Bhatnagar, S.; Fu, M.; Marcus, S., A time aggregation approach to Markov decision processes, Automatica, 38, 929-943 (2002) · Zbl 1026.93054
[9] Chen, C. L.; Li, H. X.; Dong, D. Y., Hybrid control for robot navigation—a hierarchical q-learning algorithm, IEEE Robotics and Automation Magazine, 10, 37-47 (2008)
[10] Guo, X. P.; Hernández-Lerma, O., Continuous-time Markov decision processes: theory and apllications (2009), Springer: Springer New York · Zbl 1209.90002
[11] Howard, R., Dynamic probabilistic systems volume II: semi-Markov decision processes (1971), Wiley: Wiley New York · Zbl 0227.90032
[12] Li, Y. J.; Cao, F., A basic formula for performance gradient estimation of semi-Markov decision processes, European Journal of Operational Research, 224, 333-339 (2013) · Zbl 1292.90310
[13] Puterman, M. L., Markov decision processes: discrete stochastic dynamic programming (1994), Wiley: Wiley New York · Zbl 0829.90134
[14] Ren, Z.; Krogh, B. H., Markov decision processes with fractional costs, IEEE Transactions on Automatic Control, 50, 646-650 (2005) · Zbl 1365.90261
[15] Ross, S. M., Stochastic processes (1996), Wiley: Wiley New York · Zbl 0888.60002
[16] Sun, T.; Zhao, Q. C.; Luh, P. B., Incremental value iteration for time-aggregated Markov decision processes, IEEE Transactions on Automatic Control, 52, 2177-2182 (2007) · Zbl 1366.90218
[17] Sutton, R. S.; Barto, A. G., Reinforcement learning: an introduction (1998), MIT Press: MIT Press Cambridge, MA
[18] Zhang, B.; Ho, Y. C., Performance gradient estimation for very large finite Markov chains, IEEE Transactions on Automatic Control, 36, 1218-1227 (1991) · Zbl 0736.60069
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.