Document Zbl 0949.65006

On the convergence of Markovian stochastic algorithms with rapidly decreasing ergodicity rates. (English) Zbl 0949.65006

Stochastics Stochastics Rep. 65, No. 3-4, 177-228 (1999).

A new modification of the stochastic algorithms of Robbins-Monro type is proposed.
Consider the following framework. Let \(\Omega\) be a probability space, and \((\pi_\theta, \theta\in \mathbb{R}^d)\) a family of probability distributions on \(\Omega\) indexed by a parameter \(\theta\). Let also, for all \(\theta\), \(f(\theta,\cdot)= f_\theta(\cdot)\) be a function defined on \(\Omega\) with values in \(\mathbb{R}^d\), and: \[ h(\theta)= \int_\Omega f(\theta, x) \pi_\theta(dx). \] The problem is to solve the equation \(h(\theta)= 0\). For this purpose (especially when \(h(\theta)\) is the gradient of a function \(L\) to maximize), one may follow the dynamical system \(\dot\theta= h(\theta)\) with discretized form \[ \theta_{n+1}= \theta_n+ \gamma h(\theta_n),\tag{1} \] where \(\gamma\) is the time discretization step. H. Robbins and S. Monro [Ann. Math. Statist. 22, 400-407 (1951; Zbl 0054.05901)] considered the case when at step \(n\) of the algorithm (1), a sample \(X^{n+1}\) of the distribution \(\pi_{\theta_n}\) is given, and propose an updating rule of the kind \[ \theta_{n+1}= \theta_n+ \gamma_{n+1} f(\theta_n, X^{n+ 1}).\tag{2} \] In this paper a modification of (2) is studied. Let \(\theta_n\) and \(X^n\) be the current parameter and state and several iterations of the simulation procedure are performed before updating \(\theta_n\). Let \(k_n\) be an integer, which may depend on \(\theta_n\), and \(Y^{n+ 1,0},\dots, Y^{n+1, n_n}\) \((Y^{n+ 1,0}= X^n)\) is a Markov chain in \(\Omega\), associated to some transition probability \(P_\theta\) \((P(Y^{n+1, k}\varepsilon\cdot|Y^{n+ 1,k-1}= x)= P_{\theta_n}(x,\cdot))\), which is ergodic and converge in distribution to \(\pi_{\theta}\). The author defines the recursion \[ \theta_{n+ 1}= \theta_n+ \gamma_{n+ 1} \Biggl[{1\over k_n} \sum^{k_n}_{k= 1} f(\theta_n, Y^{n+ 1,k})\Biggr]\tag{3} \] and \[ X^{n+ 1}= Y^{n+ 1,k_n}.\tag{4} \] Some examples which fall into the framework are described. Almost sure convergence results for the Markovian stochastic algorithms (3)–(4) under different hypotheses on the gain and the approximation of the ordinary differential equation are obtained. In particular, in order to obtain convergence with little restriction on the size of \(\gamma_n\), the averaging time \(k_n\) has to be adapted to the ergodicity of the Markov chain with transition \(P_{\theta_n}\).
Such an adaptive rule provides an algorithm which reacts to situations in which the current parameter would accidentally reach values far away from the limit by improving the precision of the gradient approximation. In the end of the paper a theorem stating that the algorithm also has a Gaussian behavior when it approaches convergence is proved. This theorem can provide hints for designing numerical strategies in practice.

Reviewer: N.Semejko (Kyïv)

Cited in 19 Documents

MSC:

65C60	Computational problems in statistics (MSC2010)
65C50	Other computational problems in probability (MSC2010)
62F10	Point estimation
62M30	Inference from spatial processes
60K35	Interacting random processes; statistical mechanics type models; percolation theory

Keywords:

stochastic approximation; Markov chains; diffusion approximation; Robbins-Monro algorithm; dynamical system

Citations:

Zbl 0054.05901

Cite Review PDF

Full Text: DOI