Successive approximations for Markov decision processes and Markov games with unbounded rewards

J. Wessels, J.A.E.E. van Nunen

Research output: Contribution to journalArticleAcademicpeer-review

Abstract

The aim of this paper is to give a survey of recent developments in the area of successive approximations for MARKOV decision processes and MARKOV games. We will emphasize two aspects, viz. the conditions under which successive approximations converge in some strong sense and variations of these methods which diminish the amount of computational work to be executed. With respect to the first aspect it will be shown how much unboundedness of the rewards may be allowed without violation of the convergence. With respect to the second aspect we will present four ideas, that can be applied in conjunction, which may diminish the amount of work to be done. These ideas are: 1. the use of the actual convergence of the iterates for the construction of upper and lower bounds (MAcQuEEN bounds), 2. the use of alternative policy improvement procedures (based on stopping times), 3. a better evaluation of the values of actual policies in each iteration step by a value oriented approach, 4. the elimination of suboptimal actions not onIy permanently, but also temporarily. The general presentation is given for MARKOV decision processes with a finaI section devoted to the possibilities of extension to MARKOV games.
Original languageEnglish
Pages (from-to)431-455
JournalMathematische Operationsforschung und Statistik. Series Optimization
Volume10
Publication statusPublished - 1979

Fingerprint

Dive into the research topics of 'Successive approximations for Markov decision processes and Markov games with unbounded rewards'. Together they form a unique fingerprint.

Cite this