An efficient algorithm for solving Markov decision problems is proposed. The value iteration method of dynamic programming is used in conjunction with a test for nonoptimal actions. The algorithm applies to problems with undiscounted or discounted returns with infinite or finite planning horizon. In the finite horizon case the discount factor may exceed unity. The nonoptimality test, which is an extension of Hastings test for the undiscounted reward case, is used to identify actions which cannot be optimal at the current stage. As convergence proceeds the proportion of such actions increases producing major computational savings. For problems with discount factor less than one the test is shown to be tighter than that of MacQueen.

Name | Memorandum COSOR |
---|

Volume | 7620 |
---|

ISSN (Print) | 0926-4493 |
---|