A stopping time-based policy iteration algorithm for Markov decision processes with discountfactor tending to 1

J. Wal, van der

    Research output: Book/ReportReportAcademic

    27 Downloads (Pure)

    Abstract

    This paper considers the Markov decision process with finite state and action spaces, when the discountfactor tends to 1. Miller and Veinott have shown the existence of n-discount optimal policies and Veinott has given an algorithm to determine one. In this paper we use the stopping times as introduced by Wessels to generate a set of modified policy iteration algorithms for the determination of an n-discount optimal strategy.
    Original languageEnglish
    Place of PublicationEindhoven
    PublisherTechnische Hogeschool Eindhoven
    Number of pages17
    Publication statusPublished - 1978

    Publication series

    NameMemorandum COSOR
    Volume7824
    ISSN (Print)0926-4493

    Fingerprint

    Dive into the research topics of 'A stopping time-based policy iteration algorithm for Markov decision processes with discountfactor tending to 1'. Together they form a unique fingerprint.

    Cite this