A stopping time-based policy iteration algorithm for average reward Markov decision processes

J. Wal, van der

Research output: Book/ReportReportAcademic

62 Downloads (Pure)

Abstract

We consider Howard's policy iteration algorithm for multichained finite state and action Markov decision processes at the criterion of average reward per unit time. Using stopping times as has been done by Wessels in the total reward case we obtain a set of policy improvement stepst among which Gauss Seidel, which as we show give convergent algorithms and produce average optimal strategies.
Original languageEnglish
Place of PublicationEindhoven
PublisherTechnische Hogeschool Eindhoven
Number of pages18
Publication statusPublished - 1978

Publication series

NameMemorandum COSOR
Volume7811
ISSN (Print)0926-4493

Fingerprint

Dive into the research topics of 'A stopping time-based policy iteration algorithm for average reward Markov decision processes'. Together they form a unique fingerprint.

Cite this