URL study guide
https://tue.osiris-student.nl/onderwijscatalogus/extern/cursus?cursuscode=2MMS50&collegejaar=2025&taal=enDescription
Many decision and optimization problems that arise in practical scenarios are stochastic in nature, in that they involve information or phenomena that are intrinsically random or extremely difficult to predict with any degree of certainty. Markov decision processes and stochastic dynamic programming models provide a powerful methodological framework for modeling and solving such problems. The course covers the three main problem formulations (finite horizon, infinite-horizon average reward/cost, and infinite-horizon discounted reward/cost) as well as the three main
computational approaches (value iteration, policy iteration and linear programming). The course also treats Gittins-index policies for so-called multi-armed bandit problems and various learning algorithms for optimizing exploration-exploitation trade-offs in scenarios with unknown parameter values. In addition, the course presents several miscellaneous topics and techniques in stochastic decision and optimization problems, such as newsboy problems,
achievable performance regions, optimal stopping problems and stochastic approximation methods.