A stopping time-based policy iteration algorithm for average reward Markov decision processes

J. Wal, van der

Onderzoeksoutput: Boek/rapportRapportAcademic

32 Downloads (Pure)


We consider Howard's policy iteration algorithm for multichained finite state and action Markov decision processes at the criterion of average reward per unit time. Using stopping times as has been done by Wessels in the total reward case we obtain a set of policy improvement stepst among which Gauss Seidel, which as we show give convergent algorithms and produce average optimal strategies.
Originele taal-2Engels
Plaats van productieEindhoven
UitgeverijTechnische Hogeschool Eindhoven
Aantal pagina's18
StatusGepubliceerd - 1978

Publicatie series

NaamMemorandum COSOR
ISSN van geprinte versie0926-4493

Citeer dit