The method of value oriented successive approximations for the average reward Markov decision process

J. Wal, van der

Onderzoeksoutput: Boek/rapportRapportAcademic

29 Downloads (Pure)

Samenvatting

In this paper we consider the Markov decision process with finite state and action spaces at the criterion of average reward per unit time. We will consider the method of value oriented successive approximations which has been extensively studied by Van Nunen for the total reward case. Under various conditions which guarantee the gain of the process to be independent of the starting state and a strong aperiodicity assumption we show that the method converges and produces e-optimal policies.
Originele taal-2Engels
Plaats van productieEindhoven
UitgeverijTechnische Hogeschool Eindhoven
Aantal pagina's28
StatusGepubliceerd - 1979

Publicatie series

NaamMemorandum COSOR
Volume7907
ISSN van geprinte versie0926-4493

    Vingerafdruk

Citeer dit

Wal, van der, J. (1979). The method of value oriented successive approximations for the average reward Markov decision process. (Memorandum COSOR; Vol. 7907). Eindhoven: Technische Hogeschool Eindhoven.