Abstract
A class of Markov decision processes is considered with a finite state and action space and with an incompletely known transition mechanism. The controller is looking for a strategy maximizing the Bayesian expected total discounted return. In section 2 approximations are given for this value and in section 3 we indicate how to compute the value for a fixed prior distribution.
Original language | English |
---|---|
Place of Publication | Eindhoven |
Publisher | Technische Hogeschool Eindhoven |
Number of pages | 11 |
Publication status | Published - 1976 |
Publication series
Name | Memorandum COSOR |
---|---|
Volume | 7615 |
ISSN (Print) | 0926-4493 |