A policy improvement-value approximation algorithm for the ergodic average reward Markov decision process

J. Wal, van der

Research output: Book/ReportReportAcademic

19 Downloads (Pure)

Abstract

This paper presents a policy improvement-value approximation algorithm for the average reward Markov decision process when all transition matrices are unichained. In contrast with Howard's algorithm we do not solve for the exact gain and relative value vector but only approximate them. It is shown that the value approximation algorithm produces a nearly optimal strategy. This paper extends the results of a previous paper in which transient states were not allowed. Also the algorithm is slightly different.
Original languageEnglish
Place of PublicationEindhoven
PublisherTechnische Hogeschool Eindhoven
Number of pages12
Publication statusPublished - 1978

Publication series

NameMemorandum COSOR
Volume7827
ISSN (Print)0926-4493

Fingerprint

Approximation algorithms

Cite this

Wal, van der, J. (1978). A policy improvement-value approximation algorithm for the ergodic average reward Markov decision process. (Memorandum COSOR; Vol. 7827). Eindhoven: Technische Hogeschool Eindhoven.
Wal, van der, J. / A policy improvement-value approximation algorithm for the ergodic average reward Markov decision process. Eindhoven : Technische Hogeschool Eindhoven, 1978. 12 p. (Memorandum COSOR).
@book{b3d3f4155e2943e689f0720c750188b3,
title = "A policy improvement-value approximation algorithm for the ergodic average reward Markov decision process",
abstract = "This paper presents a policy improvement-value approximation algorithm for the average reward Markov decision process when all transition matrices are unichained. In contrast with Howard's algorithm we do not solve for the exact gain and relative value vector but only approximate them. It is shown that the value approximation algorithm produces a nearly optimal strategy. This paper extends the results of a previous paper in which transient states were not allowed. Also the algorithm is slightly different.",
author = "{Wal, van der}, J.",
year = "1978",
language = "English",
series = "Memorandum COSOR",
publisher = "Technische Hogeschool Eindhoven",

}

Wal, van der, J 1978, A policy improvement-value approximation algorithm for the ergodic average reward Markov decision process. Memorandum COSOR, vol. 7827, Technische Hogeschool Eindhoven, Eindhoven.

A policy improvement-value approximation algorithm for the ergodic average reward Markov decision process. / Wal, van der, J.

Eindhoven : Technische Hogeschool Eindhoven, 1978. 12 p. (Memorandum COSOR; Vol. 7827).

Research output: Book/ReportReportAcademic

TY - BOOK

T1 - A policy improvement-value approximation algorithm for the ergodic average reward Markov decision process

AU - Wal, van der, J.

PY - 1978

Y1 - 1978

N2 - This paper presents a policy improvement-value approximation algorithm for the average reward Markov decision process when all transition matrices are unichained. In contrast with Howard's algorithm we do not solve for the exact gain and relative value vector but only approximate them. It is shown that the value approximation algorithm produces a nearly optimal strategy. This paper extends the results of a previous paper in which transient states were not allowed. Also the algorithm is slightly different.

AB - This paper presents a policy improvement-value approximation algorithm for the average reward Markov decision process when all transition matrices are unichained. In contrast with Howard's algorithm we do not solve for the exact gain and relative value vector but only approximate them. It is shown that the value approximation algorithm produces a nearly optimal strategy. This paper extends the results of a previous paper in which transient states were not allowed. Also the algorithm is slightly different.

M3 - Report

T3 - Memorandum COSOR

BT - A policy improvement-value approximation algorithm for the ergodic average reward Markov decision process

PB - Technische Hogeschool Eindhoven

CY - Eindhoven

ER -

Wal, van der J. A policy improvement-value approximation algorithm for the ergodic average reward Markov decision process. Eindhoven: Technische Hogeschool Eindhoven, 1978. 12 p. (Memorandum COSOR).