TY - JOUR

T1 - Successive approximations for Markov decision processes and Markov games with unbounded rewards

AU - Wessels, J.

AU - van Nunen, J.A.E.E.

PY - 1979

Y1 - 1979

N2 - The aim of this paper is to give a survey of recent developments in the area of successive approximations for MARKOV decision processes and MARKOV games. We will
emphasize two aspects, viz. the conditions under which successive approximations converge in some strong sense and variations of these methods which diminish the amount of computational work to be executed. With respect to the first aspect it will be shown how much unboundedness of the rewards may be allowed without violation of the convergence.
With respect to the second aspect we will present four ideas, that can be applied in conjunction, which may diminish the amount of work to be done. These ideas are: 1. the use of the actual convergence of the iterates for the construction of upper and lower bounds
(MAcQuEEN bounds), 2. the use of alternative policy improvement procedures (based on
stopping times), 3. a better evaluation of the values of actual policies in each iteration step
by a value oriented approach, 4. the elimination of suboptimal actions not onIy permanently, but also temporarily. The general presentation is given for MARKOV decision processes with a finaI section devoted to the possibilities of extension to MARKOV games.

AB - The aim of this paper is to give a survey of recent developments in the area of successive approximations for MARKOV decision processes and MARKOV games. We will
emphasize two aspects, viz. the conditions under which successive approximations converge in some strong sense and variations of these methods which diminish the amount of computational work to be executed. With respect to the first aspect it will be shown how much unboundedness of the rewards may be allowed without violation of the convergence.
With respect to the second aspect we will present four ideas, that can be applied in conjunction, which may diminish the amount of work to be done. These ideas are: 1. the use of the actual convergence of the iterates for the construction of upper and lower bounds
(MAcQuEEN bounds), 2. the use of alternative policy improvement procedures (based on
stopping times), 3. a better evaluation of the values of actual policies in each iteration step
by a value oriented approach, 4. the elimination of suboptimal actions not onIy permanently, but also temporarily. The general presentation is given for MARKOV decision processes with a finaI section devoted to the possibilities of extension to MARKOV games.

M3 - Article

VL - 10

SP - 431

EP - 455

JO - Mathematische Operationsforschung und Statistik. Series Optimization

JF - Mathematische Operationsforschung und Statistik. Series Optimization

SN - 0323-3898

ER -