Abstract
In every finite-state leavable gambling problem and in every finite-state Markov decision process with discounted, negative or positive reward criteria there exists a Markov strategy which is monotonically improving and optimal in the limit along every history. An example is given to show that for the positive and gambling cases such strategies cannot be constructed by simply switching to a "better" action or gamble at each successive return to a state.
Original language | English |
---|---|
Pages (from-to) | 463-473 |
Number of pages | 11 |
Journal | Mathematics of Operations Research |
Volume | 12 |
Issue number | 3 |
DOIs | |
Publication status | Published - 1987 |