In this paper the following result is proved. In any total reward countable state Markov decision process a Markov strategy p exists which is uniformly nearly-optimal in the following sense: v(p) = v* - e (e+u*) . Here v* denotes the value function of the process, u* denotes the value of the process if all negative rewards are neglected, and e is the unit function.
|Place of Publication||Eindhoven|
|Publisher||Technische Hogeschool Eindhoven|
|Number of pages||14|
|Publication status||Published - 1981|