TY - JOUR
T1 - Successive approximations for Markov decision processes and Markov games with unbounded rewards
AU - Wessels, J.
AU - van Nunen, J.A.E.E.
PY - 1979
Y1 - 1979
N2 - The aim of this paper is to give a survey of recent developments in the area of successive approximations for MARKOV decision processes and MARKOV games. We will
emphasize two aspects, viz. the conditions under which successive approximations converge in some strong sense and variations of these methods which diminish the amount of computational work to be executed. With respect to the first aspect it will be shown how much unboundedness of the rewards may be allowed without violation of the convergence.
With respect to the second aspect we will present four ideas, that can be applied in conjunction, which may diminish the amount of work to be done. These ideas are: 1. the use of the actual convergence of the iterates for the construction of upper and lower bounds
(MAcQuEEN bounds), 2. the use of alternative policy improvement procedures (based on
stopping times), 3. a better evaluation of the values of actual policies in each iteration step
by a value oriented approach, 4. the elimination of suboptimal actions not onIy permanently, but also temporarily. The general presentation is given for MARKOV decision processes with a finaI section devoted to the possibilities of extension to MARKOV games.
AB - The aim of this paper is to give a survey of recent developments in the area of successive approximations for MARKOV decision processes and MARKOV games. We will
emphasize two aspects, viz. the conditions under which successive approximations converge in some strong sense and variations of these methods which diminish the amount of computational work to be executed. With respect to the first aspect it will be shown how much unboundedness of the rewards may be allowed without violation of the convergence.
With respect to the second aspect we will present four ideas, that can be applied in conjunction, which may diminish the amount of work to be done. These ideas are: 1. the use of the actual convergence of the iterates for the construction of upper and lower bounds
(MAcQuEEN bounds), 2. the use of alternative policy improvement procedures (based on
stopping times), 3. a better evaluation of the values of actual policies in each iteration step
by a value oriented approach, 4. the elimination of suboptimal actions not onIy permanently, but also temporarily. The general presentation is given for MARKOV decision processes with a finaI section devoted to the possibilities of extension to MARKOV games.
M3 - Article
SN - 0323-3898
VL - 10
SP - 431
EP - 455
JO - Mathematische Operationsforschung und Statistik. Series Optimization
JF - Mathematische Operationsforschung und Statistik. Series Optimization
ER -