Abstract
Optimizing resource allocation in business processes is important because it reduces delays, improves efficiency, and helps organizations deliver timely and reliable outcomes. Within the field of business process management, various methods have been proposed to optimize process outcomes. In predictive process monitoring (PPM) methods predict future states of a case, such as the outcome or path through the process, based on the tasks that were executed before for that case, also known as the prefix. These decision on what to do with the case, based on these predictions, is then left to the judgment of the user. Prescriptive process monitoring (PrPM) automates this process by recommending the best next action, such as what resource to allocate to achieve the lowest cycle time. However, these methods only optimize the outcomes of individual cases and do not consider the impact of an intervention in one case on another. In Data-Driven Business Process Optimization (BPO), methods aim to optimize the entire process with respect to a key performance indicator.
In this thesis, we propose a series of Deep Reinforcement Learning (DRL) methods to optimize resource allocation policies in business processes. The research starts by formalizing the resource allocation problem and, upon these definitions, we propose a general DRL framework. Our DRL framework defines the four core components of the DRL: the state space, action space, reward function and learning algorithm. We evaluate our method in a controlled environment using synthetic processes, referred to as scenarios, that reflect typical real-world business process optimization problems. Moreover, we combine the scenarios into composite processes to study the generalizability of our method. The results show that our method outperforms the best benchmark by, on average, 4.5% in the scenarios and 12.7% on the composite processes. Furthermore, we train our method on a real-world business process simulated using a hybrid simulation model and demonstrate that our method achieves, on average, a 45% lower mean cycle time for cases compared to the benchmarks.
In subsequent studies, we improve each of the four components of our general DRL framework. We first improve the reward function and learning algorithm. We introduce a dense reward function that decomposes the contribution to the sum of cycle times and eliminates the need for reward engineering. Furthermore, we propose a rollout-based algorithm that learns the best action in each state by simulating execution trajectories. The results demonstrate that our reward function, in combination with our algorithm, can learn an optimal resource allocation policy across all scenarios. Furthermore, our method outperforms or matches the benchmarks in the composite processes.
We also propose to enhance the state space representation with prefixes. As shown in the fields of PPM and PrPM, prefixes are predictive of future process states and can be used to predict the best decision to take for a case. However, existing BPO approaches neglect case-level information and focus solely on optimization from the process perspective, based on aggregated information across all cases. This thesis proposes to bridge the gap between PrPM and BPO by leveraging case-level information, in the form of prefixes, and combining it with process-level information. We evaluate our method on four prefix-dependent scenarios based on real-world processes, and the results show that our DRL method, which considers prefixes, learns a policy that is, on average, 19.5% than a DRL method that does not consider prefixes.
We further improve the action space representation. Existing DRL-based BPO approaches model the action of the agent as all possible assignments of resources to tasks. While this approach works well for small-scale processes, this approach is not suitable for large processes with numerous possible assignments. For instance, in some processes, an agent can choose between hundreds of assignments, which requires many training samples before DRL converges to an optimal policy. Furthermore, heuristics, such as the shortest processing time heuristic, perform well for many resource allocation problems and existing BPO methods typically only achieve marginal gains over them. For these reasons, we model the actions as heuristics that can be chosen from, limiting the size of the action space to the included heuristics. We evaluate our method on the six scenarios and on five real-world business processes. Across all eleven process, our method learned the best policy, outperforming the best individual heuristic in six of them. On average, our method reduced the cycle time by 6.5% on the scenarios and 8.5% on the real-world business processes compared to the best heuristic for each process.
In conclusion, this work contributes to the field of BPO by introducing a comprehensive DRL-based approach that addresses the limitations of existing methods and advances the state-of-the-art of resource allocation in business processes.
| Original language | English |
|---|---|
| Qualification | Doctor of Philosophy |
| Awarding Institution |
|
| Supervisors/Advisors |
|
| Award date | 10 Feb 2026 |
| Place of Publication | Eindhoven |
| Publisher | |
| Print ISBNs | 978-90-386-6602-0 |
| Publication status | Accepted/In press - 10 Feb 2026 |