Structures of optimal policies in MDPs with unbounded jumps: the state of our art

H. Blok, F. Spieksma

Research output: Chapter in Book/Report/Conference proceedingChapterAcademicpeer-review

4 Citations (Scopus)

Abstract

The derivation of structural properties of countable state Markov decision processes (MDPs) is generally based on sample path methods or value iteration arguments. In the latter case, the method is to inductively prove the structural properties of interest for the n-horizon value function. A limit argument then should allow to deduce the structural properties for the infinite-horizon value function.
In the case of discrete time MDPs with the objective to minimise the total expected α-discounted cost, this procedure is justified under mild conditions. When the objective is to minimise the long run average expected cost, value iteration does not necessarily converge. Allowing time to be continuous does not generate any further complications when the jump rates are bounded as a function of state, due to applicability of uniformisation. However, when the jump rates are unbounded as a function of state, uniformisation is only applicable after a suitable perturbation of the jump rates that does not destroy the desired structural properties. Thus, also a second limit argument is required.
The importance of unbounded rate countable state MDPs has increased lately, due to applications modelling customer or patient impatience and abandonment. The theory validating the required limit arguments however does not seem to be complete, and results are scattered over the literature.
In this chapter our objective has been to provide a systematic way to tackle this problem under relatively mild conditions, and to provide the necessary theory validating the presented approach. The base model is a parametrised Markov process (MP): both perturbed MPs and MDPs are special cases of a parametrised MP. The advantage is that the parameter can simultaneously model a policy and a perturbation
Original languageEnglish
Title of host publicationMarkov Decision Processes in Practice
EditorsRichard Boucherie, Nico van Dijk
Place of PublicationDordrecht
PublisherSpringer
Pages131-186
Number of pages56
Edition1
ISBN (Electronic)978-3-319-47766-4
ISBN (Print)978-3-319-47764-0
DOIs
Publication statusPublished - 2017

Publication series

NameInternational Series in Operations Research & Management Science, ISOR
Volume248

Fingerprint

Dive into the research topics of 'Structures of optimal policies in MDPs with unbounded jumps: the state of our art'. Together they form a unique fingerprint.

Cite this