The method of value oriented successive approximations for the average reward Markov decision process

J. Wal, van der

Research output: Book/ReportReportAcademic

29 Downloads (Pure)

Abstract

In this paper we consider the Markov decision process with finite state and action spaces at the criterion of average reward per unit time. We will consider the method of value oriented successive approximations which has been extensively studied by Van Nunen for the total reward case. Under various conditions which guarantee the gain of the process to be independent of the starting state and a strong aperiodicity assumption we show that the method converges and produces e-optimal policies.
Original languageEnglish
Place of PublicationEindhoven
PublisherTechnische Hogeschool Eindhoven
Number of pages28
Publication statusPublished - 1979

Publication series

NameMemorandum COSOR
Volume7907
ISSN (Print)0926-4493

Fingerprint

Successive Approximation
Markov Decision Process
Reward
Optimal Policy
Converge
Unit

Cite this

Wal, van der, J. (1979). The method of value oriented successive approximations for the average reward Markov decision process. (Memorandum COSOR; Vol. 7907). Eindhoven: Technische Hogeschool Eindhoven.
Wal, van der, J. / The method of value oriented successive approximations for the average reward Markov decision process. Eindhoven : Technische Hogeschool Eindhoven, 1979. 28 p. (Memorandum COSOR).
@book{4bab8d3d7dd24a5d92b55138291ad7fa,
title = "The method of value oriented successive approximations for the average reward Markov decision process",
abstract = "In this paper we consider the Markov decision process with finite state and action spaces at the criterion of average reward per unit time. We will consider the method of value oriented successive approximations which has been extensively studied by Van Nunen for the total reward case. Under various conditions which guarantee the gain of the process to be independent of the starting state and a strong aperiodicity assumption we show that the method converges and produces e-optimal policies.",
author = "{Wal, van der}, J.",
year = "1979",
language = "English",
series = "Memorandum COSOR",
publisher = "Technische Hogeschool Eindhoven",

}

Wal, van der, J 1979, The method of value oriented successive approximations for the average reward Markov decision process. Memorandum COSOR, vol. 7907, Technische Hogeschool Eindhoven, Eindhoven.

The method of value oriented successive approximations for the average reward Markov decision process. / Wal, van der, J.

Eindhoven : Technische Hogeschool Eindhoven, 1979. 28 p. (Memorandum COSOR; Vol. 7907).

Research output: Book/ReportReportAcademic

TY - BOOK

T1 - The method of value oriented successive approximations for the average reward Markov decision process

AU - Wal, van der, J.

PY - 1979

Y1 - 1979

N2 - In this paper we consider the Markov decision process with finite state and action spaces at the criterion of average reward per unit time. We will consider the method of value oriented successive approximations which has been extensively studied by Van Nunen for the total reward case. Under various conditions which guarantee the gain of the process to be independent of the starting state and a strong aperiodicity assumption we show that the method converges and produces e-optimal policies.

AB - In this paper we consider the Markov decision process with finite state and action spaces at the criterion of average reward per unit time. We will consider the method of value oriented successive approximations which has been extensively studied by Van Nunen for the total reward case. Under various conditions which guarantee the gain of the process to be independent of the starting state and a strong aperiodicity assumption we show that the method converges and produces e-optimal policies.

M3 - Report

T3 - Memorandum COSOR

BT - The method of value oriented successive approximations for the average reward Markov decision process

PB - Technische Hogeschool Eindhoven

CY - Eindhoven

ER -

Wal, van der J. The method of value oriented successive approximations for the average reward Markov decision process. Eindhoven: Technische Hogeschool Eindhoven, 1979. 28 p. (Memorandum COSOR).