A stopping time-based policy iteration algorithm for Markov decision processes with discountfactor tending to 1

J. Wal, van der

Onderzoeksoutput: Boek/rapportRapportAcademic

21 Downloads (Pure)

Samenvatting

This paper considers the Markov decision process with finite state and action spaces, when the discountfactor tends to 1. Miller and Veinott have shown the existence of n-discount optimal policies and Veinott has given an algorithm to determine one. In this paper we use the stopping times as introduced by Wessels to generate a set of modified policy iteration algorithms for the determination of an n-discount optimal strategy.
Originele taal-2Engels
Plaats van productieEindhoven
UitgeverijTechnische Hogeschool Eindhoven
Aantal pagina's17
StatusGepubliceerd - 1978

Publicatie series

NaamMemorandum COSOR
Volume7824
ISSN van geprinte versie0926-4493

Vingerafdruk Duik in de onderzoeksthema's van 'A stopping time-based policy iteration algorithm for Markov decision processes with discountfactor tending to 1'. Samen vormen ze een unieke vingerafdruk.

Citeer dit