A stopping time-based policy iteration algorithm for average reward Markov decision processes

J. Wal, van der

Research output: Book/ReportReportAcademic

35 Downloads (Pure)

Abstract

We consider Howard's policy iteration algorithm for multichained finite state and action Markov decision processes at the criterion of average reward per unit time. Using stopping times as has been done by Wessels in the total reward case we obtain a set of policy improvement stepst among which Gauss Seidel, which as we show give convergent algorithms and produce average optimal strategies.
Original languageEnglish
Place of PublicationEindhoven
PublisherTechnische Hogeschool Eindhoven
Number of pages18
Publication statusPublished - 1978

Publication series

NameMemorandum COSOR
Volume7811
ISSN (Print)0926-4493

Cite this