Document Zbl 1049.93095

On actor-critic algorithms. (English) Zbl 1049.93095

SIAM J. Control Optimization 42, No. 4, 1143-1166 (2003).

Summary: We propose and analyze a class of actor-critic algorithms. These are two-time-scale algorithms in which the critic uses temporal difference learning with a linearly parameterized approximation architecture, and the actor is updated in an approximate gradient direction, based on information provided by the critic. We show that the features for the critic should ideally span a subspace prescribed by the choice of parameterization of the actor. We study actor-critic algorithms for Markov decision processes with Polish state and action spaces. We state and prove two results regarding their convergence.

Cited in 92 Documents

MSC:

93E35	Stochastic learning and adaptive control
68T05	Learning and adaptive systems in artificial intelligence

Keywords:

reinforcement learning; Markov decision processes; actor-critic algorithms; stochastic approximation

Cite Review PDF

Full Text: DOI