We consider Howard's policy iteration algorithm for multichained finite state and action Markov decision processes at the criterion of average reward per unit time. Using stopping times as has been done by Wessels in the total reward case we obtain a set of policy improvement stepst among which Gauss Seidel, which as we show give convergent algorithms and produce average optimal strategies.
|Place of Publication||Eindhoven|
|Publisher||Technische Hogeschool Eindhoven|
|Number of pages||18|
|Publication status||Published - 1978|