×

Density estimation and adaptive control of Markov processes: Average and discounted criteria. (English) Zbl 0717.93066

Summary: We consider a class of discrete-time Markov control processes with Borel state and action spaces, and \({\mathbb{R}}^ d\)-valued i.i.d. disturbances with unknown distribution \(\mu\). Under mild semi-continuity and compactness conditions, and assuming that \(\mu\) is absolutely continuous with respect to Lebesgue measure, we establish the existence of adaptive control policies which are (1) optimal for the average-reward criterion, and (2) asymptotically optimal in the discounted case. Our results are obtained by taking advantage of some well-known facts in the theory of density estimation. This approach allows us to avoid restrictive conditions on the state space and/or on the system’s transition law imposed in recent works, and on the other hand, it clearly shows the way to other applications of nonparametric (density) estimation to adaptive control.

MSC:

93E20 Optimal stochastic control
62G05 Nonparametric estimation
90C40 Markov and semi-Markov decision processes
93E10 Estimation and detection in stochastic control theory
Full Text: DOI

References:

[1] Acosta Abreu, R.S.: Controlled Markov chains with unknown parameters and metric state space, Bol. Soc. Mat. Mexicana, in press (in Spanish). · Zbl 0765.93080
[2] Acosta Abreu, R. S. and Hernández-Lerma, O.: Iterative adaptive control of denumerable state average-cost Markov Systems, Control Cyber. 14 (1985), 313-322. · Zbl 0606.90130
[3] Adams, R. A.: Sobolev Spaces, Academic Press, New York, 1975. · Zbl 0314.46030
[4] Ash, R. B.: Real Analysis and Probability, Academic Press, New York, 1972. · Zbl 0249.28001
[5] Bertsekas, D. P.: Dynamic Programming and Stochastic Control, Academic Press, New York, 1976. · Zbl 0549.93064
[6] Bertsekas, D. P. and Shreve, S. E.: Stochastic Optimal Control: The Discrete Time Case, Academic Press, New York, 1978. · Zbl 0471.93002
[7] Cavazos-Cadena, R.: Finite-state approximations for denumerable state discounted Markov decision processes, Appl. Math. Optim. 14 (1986), 1-26. · Zbl 0606.90132 · doi:10.1007/BF01442225
[8] Cavazos-Cadena, R.: Finite-state approximations and adaptive control of discounted Markov decision processes with unbounded rewards, Control Cyber. 16 (1987), 31-58. · Zbl 0678.93065
[9] Cavazos-Cadena, R.: Nonparametric adaptive control of discounted stochastic systems with compact state space, J. Optim. Theory Appl. 65 (1990), 191-207. · Zbl 0699.93053 · doi:10.1007/BF01102341
[10] Devroye, L: A Course in Density Estimation, Birkhäuser, Boston, 1987. · Zbl 0617.62043
[11] Devroye, L. and Györfi, L.: Nonparametric Density Estimation: The L 1 View, Wiley, New York, 1985. · Zbl 0546.62015
[12] Dynkin, E. B. and Yushkevich, A. A.: Controlled Markov Processes, Springer-Verlag, New York, 1979. · Zbl 0073.34801
[13] Doukhan, P. and Ghindès, M.: Etude du processus 306-1, C.R. Acad. Sci. Paris, Sér. A 290 (1980), 921-923. · Zbl 0433.60069
[14] Flynn, J.: Conditions for the equivalence of optimality criteria in dynamic programming, Ann. Statist. 4 (1976), 936-953. · Zbl 0351.93038 · doi:10.1214/aos/1176343590
[15] Georgin, J. P.: Estimation et contrôle des chînes de Markov sur des espaces arbitraires, Lecture Notes Math. 636 (1978), 71-113. · Zbl 0372.60094 · doi:10.1007/BFb0063261
[16] Georgin, J. P.: Contrôle de chaînes de Markov sur des espaces arbitraires, Ann. Inst. H. Poincaré, Sect. B 14 (1978), 255-277. · Zbl 0391.60066
[17] Gihman, I. I. and Skorohod, A. V.: Controlled Stochastic Processes, Springer-Verlag, New York, 1979. · Zbl 0404.60061
[18] Gordienko, E. I.: Adaptive strategies for certain classes of controlled Markov processes, Theory Probab. Appl. 29 (1985), 504-518. · Zbl 0577.93067 · doi:10.1137/1129064
[19] Hernández-Lerma, O.: Nonstationary value-iteration and adaptive control of discounted semi-Markov processes, J. Math. Anal. Appl. 112 (1985), 435-445. · Zbl 0581.90096 · doi:10.1016/0022-247X(85)90253-7
[20] Hernández-Lerma, O.: Approximation and adaptive control of Markov processes: Average reward criterion, Kybernetika (Prague) 23 (1987), 265-288. · Zbl 0633.90091
[21] Hernández-Lerma, O.: Adaptive Markov Control Processes, Springer-Verlag, New York, 1989. · Zbl 0698.90053
[22] Hernández-Lerma, O. and Cavazos-Cadena, R.: Continuous dependence of stochastic control models on the noise distribution, Appl. Math. Optim. 17 (1988), 79-89. · Zbl 0639.93068 · doi:10.1007/BF01448360
[23] Hernández-Lerma, O. and Marcus, S. I.: Adaptive control of discounted Markov decision chains, J. Optim. Theory Appl. 46 (1985), 227-235. · Zbl 0543.90093 · doi:10.1007/BF00938426
[24] Hernández-Lerma, O. and Marcus, S. I.: Adaptive policies for discrete-time stochastic control systems with unknown disturbance distribution, Syst. Control Lett. 9 (1987), 307-315. · Zbl 0637.93075 · doi:10.1016/0167-6911(87)90055-7
[25] Hernández-Lerma, O., Esparza, S. O., and Duran, B. S.: Recursive nonparametric estimation of nonstationary Markov processes, Bol. Soc. Mat. Mexicana 33 (1988). · Zbl 0732.62086
[26] Himmelberg, C. J., Parthasarathy, T., and Van, Vleck, F. S.: Optimal plans for dynamic programming problems, Math. Oper. Res. 1 (1976), 390-394. · Zbl 0368.90134 · doi:10.1287/moor.1.4.390
[27] Hinderer, K.: Foundations of Non-Stationary Dynamic Programming with Discrete Time Parameter, Lecture Notes Oper. Res. 33. Springer-Verlag, New York, 1970. · Zbl 0202.18401
[28] Iosifescu, M.: On two recent papers on ergodicity in nonhomogeneous Markov chains, Ann. Math. Statist. 43 (1972), 1732-1736. · Zbl 0249.60031 · doi:10.1214/aoms/1177692411
[29] Mandl, P.: Estimation and control in Markov chains, Adv. Appl. Probab. 6 (1974), 40-60. · Zbl 0281.60070 · doi:10.2307/1426206
[30] Prakasa Rao, B. L. S. Nonparametric Functional Estimation, Academic Press, New York, 1983. · Zbl 0542.62025
[31] Ross, S. M.: Applied Probability Models with Optimization Applications, Holden-Day, San Francisco, 1970. · Zbl 0213.19101
[32] Rudin, W.: Functional Analysis, McGraw-Hill, New York, 1973. · Zbl 0253.46001
[33] Schäl, M.: Estimation and control in discounted stochastic dynamic programming, Stochastics 20 (1987), 51-71. · Zbl 0621.90092
[34] Schäl, M.: Conditions for optimality in dynamic programming and for the limit of n-stage optimal policies to be optimal, Z. Wahrsch. verw. Geb. 32 (1975), 179-196. · Zbl 0316.90080 · doi:10.1007/BF00532612
[35] Ueno, T.: Some limit theorems for temporally discrete Markov processes, J. Fac. Sci. Univ. Tokyo 7 (1957), 449-462. · Zbl 0077.33201
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.