The aim of this paper is to give a survey of recent developments in the area of successive approximations for MARKOV decision processes and MARKOV games. We will emphasize two aspects, viz. the conditions under which successive approximations converge in some strong sense and variations of these methods which diminish the amount of computational work to be executed. With respect to the first aspect it will be shown how much unboundedness of the rewards may be allowed without violation of the convergence. With respect to the second aspect we will present four ideas, that can be applied in conjunction, which may diminish the amount of work to be done. These ideas are: 1. the use of the actual convergence of the iterates for the construction of upper and lower bounds (MAcQuEEN bounds), 2. the use of alternative policy improvement procedures (based on stopping times), 3. a better evaluation of the values of actual policies in each iteration step by a value oriented approach, 4. the elimination of suboptimal actions not onIy permanently, but also temporarily. The general presentation is given for MARKOV decision processes with a finaI section devoted to the possibilities of extension to MARKOV games.
|Mathematische Operationsforschung und Statistik. Series Optimization
|Published - 1979