A stopping time-based policy iteration algorithm for Markov decision processes with discountfactor tending to 1

J. Wal, van der

Research output: Book/ReportReportAcademic

22 Downloads (Pure)

Abstract

This paper considers the Markov decision process with finite state and action spaces, when the discountfactor tends to 1. Miller and Veinott have shown the existence of n-discount optimal policies and Veinott has given an algorithm to determine one. In this paper we use the stopping times as introduced by Wessels to generate a set of modified policy iteration algorithms for the determination of an n-discount optimal strategy.
Original languageEnglish
Place of PublicationEindhoven
PublisherTechnische Hogeschool Eindhoven
Number of pages17
Publication statusPublished - 1978

Publication series

NameMemorandum COSOR
Volume7824
ISSN (Print)0926-4493

Fingerprint Dive into the research topics of 'A stopping time-based policy iteration algorithm for Markov decision processes with discountfactor tending to 1'. Together they form a unique fingerprint.

Cite this