The method of value oriented successive approximations for the average reward Markov decision process

J. Wal, van der

Research output: Contribution to journalArticleAcademicpeer-review

4 Citations (Scopus)

Abstract

We consider the Markov decision process with finite state and action spaces at the criterion of average reward per unit time. We study the method of value oriented successive approximations, extensively treated by Van Nunen for the total reward case. Under a strong aperiodicity assumption and various conditions which guarantee that the gain of the process is independent of the starting state we show that the method converges and produces nearly optimal policies.
Original languageEnglish
Pages (from-to)233-242
Number of pages10
JournalOR Spektrum
Volume1
Issue number4
DOIs
Publication statusPublished - 1980

Fingerprint Dive into the research topics of 'The method of value oriented successive approximations for the average reward Markov decision process'. Together they form a unique fingerprint.

  • Cite this