×

Ergodic and adaptive control of nearest-neighbor motions. (English) Zbl 0736.93078

A controlled Markov chain \(X_ n\), \(n=0,1,2,\dots\) on the state space \(S=\{0,1,2,\dots\}\) with transition matrix \(P_{u,\theta}=((p(i,j,u_ i,\theta)))\), \(i,j\in S\) is considered, where \(u=[u_ 1,u_ 2,\dots]\) is the control vector, \(\theta\) is the “unknown parameter”, here \(u_ i\in D_ i\), \(\theta\in A\), \(D_ i\), \(A\) are compact sets. A sequence \(\{\xi_ n\}\), \(\{\xi_ n=[\xi_ n(0),\xi_ n(1),\dots]\) is a control strategy (CS), if for each \(i\in S\) and \(n\geq 0\), \(P_ \theta(X_{n+1}=i\mid X_ m,\xi_ m,\quad m\leq n)=p(X_ n,i\xi_ n(X_ n),\theta)\); if \(\{\xi_ n\}\) are identically distributed and \(\xi_ n\) is independent of \(X_ m\), \(m\leq n\), \(\xi_ m\), \(m<n\), for each \(n\), that \(\{\xi_ n\}\) is called a stationary randomized strategy (SRS).
Under the following assumptions: 1) under any SRS, \(\{X_ n\}\) has a single communicating class \(S\); 2) for each \(i\in S\), there is a finite set \(R_ i\subset S\) such that \(p(i,j,\cdot;\theta)\equiv 0\) for \(j\not\in R_ i\). For \(j\in R_ i\), the quantity \(I\{p(i,j,u;\theta)>0\}\times\ell_ R(p(i,j,u;\theta)/p(i,j,u,\theta_ 0))\) is uniformly bounded in \(i\), \(j\), \(u\), \(\theta\) and continuous in \(\theta\) uniformly in \(i\), \(j\), \(u\); 3) for \(\theta\neq\theta'\) in \(A\), there exists an \(i=i(\theta,\theta')\in S\) such that, for every \(u\in D\), there is a \(j=j(i,u,\theta,\theta')\in S\) with \(p(i,j,u;\theta)\neq p(i,j,u,\theta')\).
The self-tuning approach to adaptive control is applied to a class of Markov chains mentioned above, which is called nearest-neighbour motions. For compact parameter and control spaces, the almost-sure optimality of self-tuner for an ergodic cost criterion is established.

MSC:

93E20 Optimal stochastic control
93C40 Adaptive control/observation systems
Full Text: DOI

References:

[1] V. E. Beneš, Existence of optimal strategies based on specified information for a class of stochastic decision problems,SIAM J. Control Optim.,8 (1970), 179–188. · Zbl 0195.48202 · doi:10.1137/0308012
[2] V. S. Borkar, Controlled Markov chains and stochastic networks,SIAM J. Control Optim.,21 (1983), 652–666. · doi:10.1137/0321039
[3] V. S. Borkar, On minimum cost per unit time control of Markov chains,SIAM J. Control Optim.,22 (1984), 965–978. · Zbl 0566.93069 · doi:10.1137/0322062
[4] V. S. Borkar, Control of Markov chains with long-run average cost criterion, inProceedings of the I.M.A. Workshop on Stochastic Differential Systems with Application to Electrical/Computer Engineering, Control Theory and Operations Research (W. Fleming and P. L. Lions, eds.), pp. 57–77, Springer-Verlag, New York, 1987.
[5] V. S. Borkar, A convex analytic approach to Markov decision processes,Probab. Theory Related Fields,78 (1988), 583–602. · Zbl 0628.90090 · doi:10.1007/BF00353877
[6] V. S. Borkar, Control of Markov chains with long-run average cost criterion: the dynamic programming equations,SIAM J. Control Optim.,27 (1989), 642–657. · Zbl 0668.60059 · doi:10.1137/0327034
[7] V. S. Borkar, The Kumar-Becker-Lin scheme revisited,J. Optim. Theory Appl. (to appear in August 1990). · Zbl 0682.93060
[8] V. S. Borkar, and P. Varaiya, Adaptive control of Markov chain, I: Finite parameter set,IEEE Trans. Automat. Control,24 (1979), 953–957. · Zbl 0416.93065 · doi:10.1109/TAC.1979.1102191
[9] V. S. Borkar, and P. Varaiya, Identification and adaptive control of Markov chains,SIAM J. Control Optim.,20 (1982), 470–489. · Zbl 0491.93063 · doi:10.1137/0320035
[10] B. Hajek, Hitting-time and occupation-time bounds implied by drift analysis with applications,Adv. in Appl. Probab.,14 (1982), 502–525. · Zbl 0495.60094 · doi:10.2307/1426671
[11] O. Hernandez-Lerma and S. I. Marcus, Adaptive control of service in queueing systems,Systems Control Lett.,3 (1983), 283–289. · Zbl 0534.90037 · doi:10.1016/0167-6911(83)90027-0
[12] O. Hernandez-Lerma and S. I. Marcus, Optimal adaptive control of priority assignment in queueing systems,Systems Control Lett.,4, (1984), 65–72. · Zbl 0529.90045 · doi:10.1016/S0167-6911(84)80053-5
[13] P. R. Kumar, Survey of results in stochastic adaptive control,SIAM J. Control Optim.,23 (1985), 329–380. · Zbl 0571.93038 · doi:10.1137/0323023
[14] P. R. Kumar and W. Lin, Optimal adaptive controllers for unknown Markov chains,IEEE Trans. Automat. Control,27 (1982), 765–774. · Zbl 0488.93036 · doi:10.1109/TAC.1982.1103017
[15] P. Mandl, Estimation and control in Markov chains,Adv. in Appl. Probab.,6 (1974), 40–60. · Zbl 0281.60070 · doi:10.2307/1426206
[16] B. Sagalovsky, Adaptive control and parameter estimation in Markov chains: a linear case,IEEE Trans. Automat. Control,27 (1982), 137–146. · Zbl 0479.93061 · doi:10.1109/TAC.1982.1102921
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.