×

Adaptive control for discrete-time Markov processes with unbounded costs: Discounted criterion. (English) Zbl 1274.90474

Summary: We study the adaptive control problem for discrete-time Markov control processes with Borel state and action spaces and possibly unbounded one-stage costs. The processes are given by recurrent equations \(x_{t+1}=F(x_t,a_t,\xi _t),\) \(t=0,1,\ldots \) with i.i.d. \(\text{Re} ^k\)-valued random vectors \(\xi _t\) whose density \(\rho \) is unknown. Assuming observability of \(\xi _t\) we propose the procedure of statistical estimation of \(\rho \) that allows us to prove discounted asymptotic optimality of two types of adaptive policies used early for the processes with bounded costs.

MSC:

90C40 Markov and semi-Markov decision processes
62M05 Markov processes: estimation; hidden Markov models

References:

[1] Agrawal R.: Minimizing the learning loss in adaptive control of Markov chains under the weak accessibility condition. J. Appl. Probab. 28 (1991), 779-790 · Zbl 0741.60070 · doi:10.2307/3214681
[2] Ash R. B.: Real Analysis and Probability. Academic Press, New York 1972 · Zbl 1381.28001
[3] Cavazos-Cadena R.: Nonparametric adaptive control of discounted stochastic system with compact state space. J. Optim. Theory Appl. 65 (1990), 191-207 · Zbl 0699.93053 · doi:10.1007/BF01102341
[4] Dynkin E. B., A A.: Yushkevich: Controlled Markov Processes. Springer-Verlag, New York 1979
[5] Fernández-Gaucherand E., Arapostathis A., Marcus S. I.: A methodology for the adaptive control of Markov chains under partial state information. Proc. of the 1992 Conf. on Information Sci. and Systems, Princeton, New Jersey, pp. 773-775
[6] Fernández-Gaucherand E., Arapostathis A., Marcus S. I.: Analysis of an adaptive control scheme for a partially observed controlled Markov chain. IEEE Trans. Automat. Control 38 (1993), 987-993 · Zbl 0786.93089 · doi:10.1109/9.222316
[7] Gordienko E. I.: Adaptive strategies for certain classes of controlled Markov processes. Theory Probab. Appl. 29 (1985), 504-518 · Zbl 0577.93067 · doi:10.1137/1129064
[8] Gordienko E. I.: Controlled Markov sequences with slowly varying characteristics II. Adaptive optimal strategies. Soviet J. Comput. Systems Sci. 23 (1985), 87-93 · Zbl 0618.93070
[9] Gordienko E. I., Hernández-Lerma O.: Average cost Markov control processes with weighted norms: value iteration. Appl. Math. 23 (1995), 219-237 · Zbl 0829.93068
[10] Gordienko E. I., Montes-de-Oca R., Minjárez-Sosa J. A.: Approximation of average cost optimal policies for general Markov decision processes with unbounded costs. Math. Methods Oper. Res. 45 (1997), 2, to appear · Zbl 0882.90127 · doi:10.1007/BF01193864
[11] Hasminskii R., Ibragimov I.: On density estimation in the view of Kolmogorov’s ideas in approximation theory. Ann. of Statist. 18 (1990), 999-1010 · Zbl 0705.62039 · doi:10.1214/aos/1176347736
[12] Hernández-Lerma O.: Adaptive Markov Control Processes. Springer-Verlag, New York 1989 · Zbl 0698.90053 · doi:10.1007/978-1-4419-8714-3
[13] Hernández-Lerma O.: Infinite-horizon Markov control processes with undiscounted cost criteria: from average to overtaking optimality. Reporte Interno 165. Departamento de Matemáticas, CINVESTAV-IPN, A.P. 14-740.07000, México, D. F., México (1994). · Zbl 0906.93062
[14] Hernández-Lerma O., Cavazos-Cadena R.: Density estimation and adaptive control of Markov processes: average and discounted criteria. Acta Appl. Math. 20 (1990), 285-307 · Zbl 0717.93066 · doi:10.1007/BF00049572
[15] Hernández-Lerma O., Lasserre J. B.: Discrete-Time Markov Control Processes. Springer-Verlag, New York 1995 · Zbl 0928.93002
[16] Hernández-Lerma O., Marcus S. I.: Adaptive control of discounted Markov decision chains. J. Optim. Theory Appl. 46 (1985), 227-235 · Zbl 0543.90093 · doi:10.1007/BF00938426
[17] Hernández-Lerma O., Marcus S. I.: Adaptive policies for discrete-time stochastic control system with unknown disturbance distribution. Systems Control Lett. 9 (1987), 307-315 · Zbl 0637.93075 · doi:10.1016/0167-6911(87)90055-7
[18] Hinderer K.: Foundations of Non-Stationary Dynamic Programming with Discrete Time Parameter. (Lecture Notes in Operations Research and Mathematical Systems 33.) Springer-Verlag, Berlin - Heidelberg - New York 1970 · Zbl 0202.18401
[19] Köthe G.: Topological Vector Spaces I. Springer-Verlag, New York 1969 · Zbl 0179.17001
[20] Kumar P. R., Varaiya P.: Stochastic Systems: Estimation, Identification and Adaptive Control. Prentice-Hall, Englewood Cliffs 1986 · Zbl 0706.93057
[21] Lippman S. A.: On dynamic programming with unbounded rewards. Management Sci. 21 (1975), 1225-1233 · Zbl 0309.90017 · doi:10.1287/mnsc.21.11.1225
[22] Mandl P.: Estimation and control in Markov chains. Adv. in Appl. Probab. 6 (1974), 40-60 · Zbl 0281.60070 · doi:10.2307/1426206
[23] Rieder U.: Measurable selection theorems for optimization problems. Manuscripta Math. 24 (1978), 115-131 · Zbl 0385.28005 · doi:10.1007/BF01168566
[24] Schäl M.: Estimation and control in discounted stochastic dynamic programming. Stochastics 20 (1987), 51-71 · Zbl 0621.90092 · doi:10.1080/17442508708833435
[25] Stettner L.: On nearly self-optimizing strategies for a discrete-time uniformly ergodic adaptive model. J. Appl. Math. Optim. 27 (1993), 161-177 · Zbl 0769.93084 · doi:10.1007/BF01195980
[26] Stettner L.: Ergodic control of Markov process with mixed observation structure. Dissertationes Math. 341 (1995), 1-36
[27] Nunen J. A. E. E. van, Wessels J.: A note on dynamic programming with unbounded rewards. Management Sci. 24 (1978), 576-580 · Zbl 0374.49015 · doi:10.1287/mnsc.24.5.576
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.