Abstract
This paper presents a framework for mapping the value-iteration and related successive approximation methods for Markov Decision Problems onto Hopfield neural networks, for both discounted and undiscounted versions of the finite state and action spaces. We analyse the asymptotic behaviour of the control sets and we give some estimates on the convergence rate for the value-iteration scheme. We relate the convergence properties on an energy function which represents the key point in mapping Markov Decision Problems onto Hopfield networks. Finally, an application from queueing systems in communication networks is taken into consideration and the results of computer simulation of Hopfield network running for the equivalent Markov Decision Problem are presented, together with some comments on possible developments.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
J. Bather: Optimal decision procedures for finite Markov chains. Adv. in Appl. Prob. 5, Part I, 328–339 (1973)
R. Belman: Applied Dynamic Programming, Princeton University Press, Princeton, NJ, 1962
D.P. Bertsekas: Constrained Optimization and Lagrange Multiplier Methods. New York: Academic Press 1982
D.P. Bertesekas and J.N.Tsitsiklis: Parallel and Distributed Computation: Numerical Methods. Englewood Cliffs, NJ: Prentice-Hall 1989
D.P. Bertsekas and R. Gallager: Data Networks. Englewood Cliffs, NJ: Prentice-Hall 1992
B. Brown: On the iterative method of dynamic programming on a finite space discrete time Markov process. Ann. Math. Statist 36, 1279–1285 (1965)
E. V. Denardo: Contraction mappings in the theory underlying dynamic programming. SIAM Rev. 9, 165–177 (1967)
E.V. Denardo: A Markov decision problem. In: T.C. Hu, S.M. Robinson (Eds.): Mathematical Programming. Academic Press 1973, pp. 33–68
C. Derman: Finite State Markovian Decision Processes. New York: Academic Press, 1970
B. Finkbeiner, W. Rungaldier: A value iteration algorithm for Markov renewal programming. In: L. Zadeh (ed.): Computing Methods in optimization Problems 2. New York: Academic Press 1969, pp. 95–104
N. Hastings: Optimization of discounted Markov decision problems. Op. Res. Quart. 20, 499–500 (1969)
N. Hastings: Bounds on the gain of a Markov decision process. Op. Res. 19, 240–244 (1971)
N. Hastings, J. Mello: Tests for suboptimal actions in discounted Markov programming. Man. Sci. 19, 1019–1022 (1973)
J. Hertz, A. Krogh, P.G. Palmer: Introduction to the Theory of Neural Computation. Redwood City: Addison-Wesley Publishing Company 1991
R. Howard: Dynamic Programming and Markov Processes, John Wiley, New York, 1960
H. Kushner, A. Kleinman: Accelerated procedures for the solution of discrete Markov control problems. IEEE Trans. Automatic Control AC-16, 147–152 (1971)
V. Lakshmikantham, S. Leela, A.A. Marttynyuk: Stability Analysis of Nonlinear Systems. New York: Marcel Dekker, Inc. 1989
J. MacQueen: A test for suboptimal actions in Markovian decision problems. Op. Res. 15, 559–561 (1967)
T. Morton, W. Wecker: Discounting, ergodicity and convergence for Markov decision processes. Man. Sci. 23, 890–900 (1977)
A. Odoni: On finding the maximal gain for Markov decision processes. Op. Res. 17, 857–860 (1969)
E. Porteus: Bounds and transformations for discounted finite Markov decision chains. Op. Res. 23, 761–784 (1975)
E. Porteus, S.J. Totten: Accelerated computation of the expected discounted return in a Markov chain. Op. Res. 26, 350–358 (1978)
D. Reetz: Solution of a Markovian decision problem by successive overrelaxation. Op. Res. 21, 29–32 (1973)
S.M. Ross: Stochastic Processes. New York: John Wiley & Sons 1983
M. Schwartz: Telecommunication Networks: Protocols, Modeling and Analysis. Reading, MA: Addison-Wesley Publishing Company 1987
P.J. Schweitzer: A turnpike theorem for undiscounted Markovian decision processes. ORSA/TIMS National Meeting, May 1968
P.J. Schweitzer: Iterative solution of the functional equations for undiscounted Markov renewal programming. J.M.A.A. 34, 495–501 (1971)
J. Shapiro: Turnpike planning horizons for a Markovian decision model. Man. Sci. 14, 292–300 (1968)
J. Van Nunen: A set of successive approximation methods for discounted Markovian decision problems. Op. Res. 20, 203–209 (1976)
D. White: Dynamic programming, Markov chains, and the method of successive approximations. J.M.A.A. 6, 373–376 (1963)
W. Zangwill: Nonlinear Programming. A Unified approach. Englewood Cliffs, NJ: Prentice Hall 1969
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1995 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Murgu, A. (1995). Mapping discounted and undiscounted Markov Decision Problems onto Hopfield neural networks. In: Andersson, S.I. (eds) Analysis of Dynamical and Cognitive Systems. Lecture Notes in Computer Science, vol 888. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-58843-4_17
Download citation
DOI: https://doi.org/10.1007/3-540-58843-4_17
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-58843-6
Online ISBN: 978-3-540-49113-2
eBook Packages: Springer Book Archive