Markov decision processes which allow for an unbounded reward structure are considered. Conditions are given which allow successive approximations with a convergence in some strong sense. This "strong" convergence enables the construction of upper and lower bounds.
The conditions are weaker than those proposed by Lippman [15], Harrison [5] and Wessels [28] and are in fact a slight generalization of the conditions proposed by Van Nunen [21].
A successive approximation algorithm will be indicated. The conditions will be analysed and compared with those in literature.
Naam | Memorandum COSOR |
---|
Volume | 7623 |
---|
ISSN van geprinte versie | 0926-4493 |
---|