Total reward criteria for unconstrained/constrained continuous-time Markov decision processes

Xianping Guo¹ &
Lanlan Zhang²

124 Accesses
1 Citation
Explore all metrics

Abstract

This paper studies denumerable continuous-time Markov decision processes with expected total reward criteria. The authors first study the unconstrained model with possible unbounded transition rates, and give suitable conditions on the controlled system’s primitive data under which the authors show the existence of a solution to the total reward optimality equation and also the existence of an optimal stationary policy. Then, the authors impose a constraint on an expected total cost, and consider the associated constrained model. Basing on the results about the unconstrained model and using the Lagrange multipliers approach, the authors prove the existence of constrained-optimal policies under some additional conditions. Finally, the authors apply the results to controlled queueing systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Approximate Optimal Cost and Policies of First Passage Markov Decision Processes with Countable-State Space and Discount Factors

Constrained Continuous-Time Markov Decision Processes on the Finite Horizon

Article 15 April 2016

First passage Markov decision processes with constraints and varying discount factors

Article 25 June 2015

References

X. P. Guo, O. Hernández-Lerma, and T. Prieto-Rumeau, A survey of recent results on continuoustime Markov decision processes, TOP, 2006, 14(2): 177–246.
Article MathSciNet MATH Google Scholar
X. P. Guo and O. Hernández-Lerma, Continuous-time controlled Markov chains, Ann. Appl. Probab., 2003a, 13: 363–388.
Article MathSciNet MATH Google Scholar
X. P. Guo and O. Hernández-Lerma, Continuous-time controlled Markov chains with discounted rewards, Acta. Appl. Math., 2003, 79: 195–216.
Article MathSciNet MATH Google Scholar
X. P. Guo and O. Hernández-Lerma, Constrained continuous-time Markov controlled processes with discounted criteria, Stochastic Anal. Appl., 2003, 21(2): 379–399.
Article MathSciNet MATH Google Scholar
O. Hernández-Lerma and T. E. Govindan, Nonstationary continuous-time Markov control processes with discounted costs on infinite horizon, Acta. Appl. Math., 2001, 67: 277–293.
Article MathSciNet MATH Google Scholar
M. L. Puterman, Markov Decision Processes, Wiley, New York, 1994.
Book MATH Google Scholar
X. P. Guo and X. R. Cao, Optimal control of ergodic continues-time Markov chains with average sample-path rewards, SIAM J. Control Optim., 2005, 44(1): 29–48.
Article MathSciNet MATH Google Scholar
X. P. Guo and O. Hernández-Lerma, Drift and monotonicity conditions for continuous-time controlled Markov chains with an average criterion, IEEE Trans. Automat. Control., 2003, 48(2): 236–245.
Article MathSciNet Google Scholar
X. P. Guo and K. Liu, A note on optimality conditions for continuous-time Markov decision processes with average cost criterion, IEEE Trans. Automat. Control, 2001, 46: 1984–1988.
Article MathSciNet MATH Google Scholar
X. P. Guo and W. P. Zhu, Denumerable state continuous-time Markov decision processes with unbounded cost and transition rates under average criterion, Anziam J., 2002, 43: 541–551.
MathSciNet MATH Google Scholar
P. Kakumanu, Non-discounted continuous-time Markov decision processes with countable state space, SIAM J. Control, 1972, 10: 210–220.
Article MathSciNet MATH Google Scholar
M. E. Lewis and M. L. Puterman, A note on bias optimality in controlled queueing systems, J. Appl. Probab., 2000, 37: 300–305.
Article MathSciNet MATH Google Scholar
T. Prieto-Rumeau and O. Hernández-Lerma, Ergodic control of continuous-time Markov chains with pathwise constraints, SIAM J. Control Optim., 2008, 47(4): 1888–1908.
Article MathSciNet MATH Google Scholar
T. Prieto-Rumeau, Blackwell optimality in the class of Markov policies for continuous-time controlled Markov chains, Acta Appl. Math., 2006, 92: 77–96.
Article MathSciNet MATH Google Scholar
T. Prieto-Rumeau and O. Hernández-Lerma, The Laurent series, sensitive discount and Blackwell optimality for continuous-time controlled Markov chains, Math. Meth. Oper. Res., 2005a, 61: 123–145.
Article MATH Google Scholar
T. Prieto-Rumeau and O. Hernández-Lerma, Bias optimality for continuous-time controlled Markov chains, SIAM J. Control Optim., 2006, 45: 51–73.
Article MathSciNet MATH Google Scholar
D. P. Bertsekas, Dynamic Programming and Optimal Control, 2nd Edition, Athena Scientific, Belmont, 2001.
MATH Google Scholar
O. Hernández-Lerma and J. B. Lasserre, Further Topics on Discrete-Time Markov Control Processes, Springer-Verlag, New York, 1999.
MATH Google Scholar
E. Altman, Constrained Markov Decision Processes, Chapman and Hall/CRC, Boca Raton, FL, 1999.
MATH Google Scholar
J. Alvarez-Mena and O. Hernández-Lerma, Convergence of the optimal values of constrained Markov control processes, Math. Meth. Oper. Res., 2002, 55(3): 461–484.
Article MATH Google Scholar
A. B. Piunovskiy, Optimal Control of Random Sequences in Problems with Constraints, Dordrecht, Kluwer, 1997.
MATH Google Scholar
Y. Serin and V. Kulkarni, Markov decision processes under observability constraints, Math. Meth. Oper. Res., 2005, 61: 311–328.
Article MathSciNet MATH Google Scholar
W. J. Anderson, Continuous-Time Markov Chains, Springer-Verlag, New York, 1991.
MATH Google Scholar
K. L. Chung, Markov Chains with Stationary Transition Probabilities, Springer-Verlag, Berlin, 1960.
MATH Google Scholar
L. E. Ye, X. P. Guo, and O. Hernández-Lerma, Existence and reguarity of a nonhomogeneous transition matrix under measurability conditions, J. Theor. Probab., 2008, 21: 604–627.
Article MATH Google Scholar
F. J. Beutler and K. W. Ross, Optimal policies for controlled Markov chains with a constraint, J. Math. Anal. Appl., 1985, 112: 236–252.
Article MathSciNet MATH Google Scholar
X. P. Guo, Constrained nonhomogeneous Markov decision processes with expected total reward criterion, Acta Appl. Math. Sin., English Ser., 2000, 23: 230–235.
Google Scholar
L. L. Zhang and X. P. Guo, Constrained continuous-time Markov decision processes with average criteria, Math. Meth. Oper. Res., 2007, 67: 323–340.
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Mathematics and Computational Science, Sun Yat-sen University, Guangzhou, 510275, China
Xianping Guo
School of Public Health and Tropical Medicine, Southern Medical University, Guangzhou, 510515, China
Lanlan Zhang

Authors

Xianping Guo
View author publications
You can also search for this author in PubMed Google Scholar
Lanlan Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xianping Guo.

Additional information

This research is supported by the National Natural Science Foundation of China under Grant Nos. 10925107 and 60874004.

This paper was recommended for publication by Editor Guohua ZOU.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Guo, X., Zhang, L. Total reward criteria for unconstrained/constrained continuous-time Markov decision processes. J Syst Sci Complex 24, 491–505 (2011). https://doi.org/10.1007/s11424-011-8004-9

Download citation

Received: 02 January 2008
Revised: 04 November 2009
Published: 14 June 2011
Issue Date: June 2011
DOI: https://doi.org/10.1007/s11424-011-8004-9

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Approximate Optimal Cost and Policies of First Passage Markov Decision Processes with Countable-State Space and Discount Factors

Constrained Continuous-Time Markov Decision Processes on the Finite Horizon

First passage Markov decision processes with constraints and varying discount factors

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Key words

Subscribe and save

Buy Now

Navigation

Total reward criteria for unconstrained/constrained continuous-time Markov decision processes

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Approximate Optimal Cost and Policies of First Passage Markov Decision Processes with Countable-State Space and Discount Factors

Constrained Continuous-Time Markov Decision Processes on the Finite Horizon

First passage Markov decision processes with constraints and varying discount factors

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Key words

Subscribe and save

Buy Now

Search

Navigation