In this paper it will be investigated how the concept of value-conserving strategies can be generalized from Markov decision processes to Markov games. It will be proved that optimal Markov strategies are necessarily saddle conserving, which is the most straightforward generalization. Another generalization (called saddling) is shown to constitute a sufficient condition for optimality under relatively strong assumptions for the convergence of total expected rewards. Counterexamples show that saddle conserving is not sufficient for optimality (even under these strong convergence assumptions) and saddling is proved to be not necessary.
| Naam | Memorandum COSOR |
|---|
| Volume | 7614 |
|---|
| ISSN van geprinte versie | 0926-4493 |
|---|