### Abstract

Original language | English |
---|---|

Place of Publication | Eindhoven |

Publisher | Technische Hogeschool Eindhoven |

Number of pages | 12 |

Publication status | Published - 1978 |

### Publication series

Name | Memorandum COSOR |
---|---|

Volume | 7827 |

ISSN (Print) | 0926-4493 |

### Fingerprint

### Cite this

*A policy improvement-value approximation algorithm for the ergodic average reward Markov decision process*. (Memorandum COSOR; Vol. 7827). Eindhoven: Technische Hogeschool Eindhoven.

}

*A policy improvement-value approximation algorithm for the ergodic average reward Markov decision process*. Memorandum COSOR, vol. 7827, Technische Hogeschool Eindhoven, Eindhoven.

**A policy improvement-value approximation algorithm for the ergodic average reward Markov decision process.** / Wal, van der, J.

Research output: Book/Report › Report › Academic

TY - BOOK

T1 - A policy improvement-value approximation algorithm for the ergodic average reward Markov decision process

AU - Wal, van der, J.

PY - 1978

Y1 - 1978

N2 - This paper presents a policy improvement-value approximation algorithm for the average reward Markov decision process when all transition matrices are unichained. In contrast with Howard's algorithm we do not solve for the exact gain and relative value vector but only approximate them. It is shown that the value approximation algorithm produces a nearly optimal strategy. This paper extends the results of a previous paper in which transient states were not allowed. Also the algorithm is slightly different.

AB - This paper presents a policy improvement-value approximation algorithm for the average reward Markov decision process when all transition matrices are unichained. In contrast with Howard's algorithm we do not solve for the exact gain and relative value vector but only approximate them. It is shown that the value approximation algorithm produces a nearly optimal strategy. This paper extends the results of a previous paper in which transient states were not allowed. Also the algorithm is slightly different.

M3 - Report

T3 - Memorandum COSOR

BT - A policy improvement-value approximation algorithm for the ergodic average reward Markov decision process

PB - Technische Hogeschool Eindhoven

CY - Eindhoven

ER -