A class of Markov decision processes is considered with a finite state and action space and with an incompletely known transition mechanism. The controller is looking for a strategy maximizing the Bayesian expected total discounted return. In section 2 approximations are given for this value and in section 3 we indicate how to compute the value for a fixed prior distribution.
|Place of Publication||Eindhoven|
|Publisher||Technische Hogeschool Eindhoven|
|Number of pages||11|
|Publication status||Published - 1976|