Summary
This paper develops a new framework for the study of Markov decision processes in which the control problem is viewed as an optimization problem on the set of canonically induced measures on the trajectory space of the joint state and control process. This set is shown to be compact convex. One then associates with each of the usual cost criteria (infinite horizon discounted cost, finite horizon, control up to an exit time) a naturally defined occupation measure such that the cost is an integral of some function with respect to this measure. These measures are shown to form a compact convex set whose extreme points are characterized. Classical results about existence of optimal strategies are recovered from this and several applications to multicriteria and constrained optimization problems are briefly indicated.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Billingsley, P.: Convergence of probability measures. New York; Wiley 1968
Bertsekas, D.P.: Dynamic Programming and stochastic control. New York: Academic 1976
Borkar, V.S.: On minimum cost per unit time control of Markov chains, SIAM J. Control Optimization 22, 965–978 (1984)
Borkar, V.S.: Control of Markov chains with long-run average cost criterion. In: Fleming, W., Lions, P.L. (eds.) Stochastic differential systems, stochastic control theory and applications, IMA vol. 10, pp. 57–77. Berlin Heidelberg New York: Springer 1988
Phelps, R.: Lectures on Choquet's theorem. New York: Nostrand 1966
Makowski, A. Schwartz, A.: Implementation issues for Markov decision processes. In: Fleming, W., Lions, P.L. (eds.). Stochastic differential systems, stochastic control theory and applications, IMA vol. 10, pp. 323–337. Berlin Heidelberg New York: Springer 1988
Ross, S.: Introduction to stochastic dynamic programming. New York: Academic 1984
Author information
Authors and Affiliations
Additional information
Research supported by NSF Grant CDR-85-00108
Rights and permissions
About this article
Cite this article
Borkar, V.S. A convex analytic approach to Markov decision processes. Probab. Th. Rel. Fields 78, 583–602 (1988). https://doi.org/10.1007/BF00353877
Received:
Revised:
Issue Date:
DOI: https://doi.org/10.1007/BF00353877