Document Zbl 0556.90089

Hendrikx, Marcel; van Nunen, Jo; Wessels, Jaap

On iterative optimization of structured Markov decision processes with discounted rewards. (English) Zbl 0556.90089

Math. Operationsforsch. Stat., Ser. Optimization 15, 439-459 (1984).

The paper gives a survey on solution techniques for Markov decision processes with respect to the total reward criterion. We will discuss briefly a number of problem structures which guarantee that the optimal policy possesses a specific structure which can be exploited in numerical solution procedures. However, the main emphasis is on iterative methods. It is shown by examples that the effect of a number of modifications of the standard iterative method, which are advocated in the literature, is limited in some realistic situations. Numerical evidence is provided to show that exploiting the structure of a problem under consideration often yields a more substantial reduction of the required computational effort than some of the existing acceleration procedures. We advocate that this structure should be analyzed and used in choosing the appropriate solution procedure. The appropriate procedure might be composed on one hand by blending several of the acceleration concepts that are described in literature. On the other hand it should exploit the structure of optimal policies. This latter can often be done by using monotonicity properties of the computed value vector or policy. To illustrate the influence of the structure of a problem and if available, the structure of optimal policies, we sketch and solve four test problems with several successive approximation methods. In the final part of the paper, the required computational efforts are compared.

Cited in 5 Documents

MSC:

90C40	Markov and semi-Markov decision processes
65K05	Numerical mathematical programming methods

Keywords:

survey on solution techniques; total reward criterion; optimal policy; iterative methods; test problems; successive approximation

Cite Review PDF

Full Text: DOI