×

On the convergence of Markovian stochastic algorithms with rapidly decreasing ergodicity rates. (English) Zbl 0949.65006

A new modification of the stochastic algorithms of Robbins-Monro type is proposed.
Consider the following framework. Let \(\Omega\) be a probability space, and \((\pi_\theta, \theta\in \mathbb{R}^d)\) a family of probability distributions on \(\Omega\) indexed by a parameter \(\theta\). Let also, for all \(\theta\), \(f(\theta,\cdot)= f_\theta(\cdot)\) be a function defined on \(\Omega\) with values in \(\mathbb{R}^d\), and: \[ h(\theta)= \int_\Omega f(\theta, x) \pi_\theta(dx). \] The problem is to solve the equation \(h(\theta)= 0\). For this purpose (especially when \(h(\theta)\) is the gradient of a function \(L\) to maximize), one may follow the dynamical system \(\dot\theta= h(\theta)\) with discretized form \[ \theta_{n+1}= \theta_n+ \gamma h(\theta_n),\tag{1} \] where \(\gamma\) is the time discretization step. H. Robbins and S. Monro [Ann. Math. Statist. 22, 400-407 (1951; Zbl 0054.05901)] considered the case when at step \(n\) of the algorithm (1), a sample \(X^{n+1}\) of the distribution \(\pi_{\theta_n}\) is given, and propose an updating rule of the kind \[ \theta_{n+1}= \theta_n+ \gamma_{n+1} f(\theta_n, X^{n+ 1}).\tag{2} \] In this paper a modification of (2) is studied. Let \(\theta_n\) and \(X^n\) be the current parameter and state and several iterations of the simulation procedure are performed before updating \(\theta_n\). Let \(k_n\) be an integer, which may depend on \(\theta_n\), and \(Y^{n+ 1,0},\dots, Y^{n+1, n_n}\) \((Y^{n+ 1,0}= X^n)\) is a Markov chain in \(\Omega\), associated to some transition probability \(P_\theta\) \((P(Y^{n+1, k}\varepsilon\cdot|Y^{n+ 1,k-1}= x)= P_{\theta_n}(x,\cdot))\), which is ergodic and converge in distribution to \(\pi_{\theta}\). The author defines the recursion \[ \theta_{n+ 1}= \theta_n+ \gamma_{n+ 1} \Biggl[{1\over k_n} \sum^{k_n}_{k= 1} f(\theta_n, Y^{n+ 1,k})\Biggr]\tag{3} \] and \[ X^{n+ 1}= Y^{n+ 1,k_n}.\tag{4} \] Some examples which fall into the framework are described. Almost sure convergence results for the Markovian stochastic algorithms (3)–(4) under different hypotheses on the gain and the approximation of the ordinary differential equation are obtained. In particular, in order to obtain convergence with little restriction on the size of \(\gamma_n\), the averaging time \(k_n\) has to be adapted to the ergodicity of the Markov chain with transition \(P_{\theta_n}\).
Such an adaptive rule provides an algorithm which reacts to situations in which the current parameter would accidentally reach values far away from the limit by improving the precision of the gradient approximation. In the end of the paper a theorem stating that the algorithm also has a Gaussian behavior when it approaches convergence is proved. This theorem can provide hints for designing numerical strategies in practice.
Reviewer: N.Semejko (Kyïv)

MSC:

65C60 Computational problems in statistics (MSC2010)
65C50 Other computational problems in probability (MSC2010)
62F10 Point estimation
62M30 Inference from spatial processes
60K35 Interacting random processes; statistical mechanics type models; percolation theory

Citations:

Zbl 0054.05901
Full Text: DOI