The method of value oriented successive approximations for the average reward Markov decision process

J. Wal, van der

    Research output: Contribution to journalArticleAcademicpeer-review

    4 Citations (Scopus)

    Abstract

    We consider the Markov decision process with finite state and action spaces at the criterion of average reward per unit time. We study the method of value oriented successive approximations, extensively treated by Van Nunen for the total reward case. Under a strong aperiodicity assumption and various conditions which guarantee that the gain of the process is independent of the starting state we show that the method converges and produces nearly optimal policies.
    Original languageEnglish
    Pages (from-to)233-242
    Number of pages10
    JournalOR Spektrum
    Volume1
    Issue number4
    DOIs
    Publication statusPublished - 1980

    Fingerprint Dive into the research topics of 'The method of value oriented successive approximations for the average reward Markov decision process'. Together they form a unique fingerprint.

    Cite this