A policy improvement-value approximation algorithm for the ergodic average reward Markov decision process

J. Wal, van der

Research output: Book/ReportReportAcademic

24 Downloads (Pure)

Abstract

This paper presents a policy improvement-value approximation algorithm for the average reward Markov decision process when all transition matrices are unichained. In contrast with Howard's algorithm we do not solve for the exact gain and relative value vector but only approximate them. It is shown that the value approximation algorithm produces a nearly optimal strategy. This paper extends the results of a previous paper in which transient states were not allowed. Also the algorithm is slightly different.
Original languageEnglish
Place of PublicationEindhoven
PublisherTechnische Hogeschool Eindhoven
Number of pages12
Publication statusPublished - 1978

Publication series

NameMemorandum COSOR
Volume7827
ISSN (Print)0926-4493

Fingerprint Dive into the research topics of 'A policy improvement-value approximation algorithm for the ergodic average reward Markov decision process'. Together they form a unique fingerprint.

Cite this