In this paper, we propose an approximate dynamic programming (ADP) algorithm to solve a Markov decision process (MDP) formulation for the admission control of elective patients. To manage the elective patients from multiple specialties equitably and efficiently, we establish a waiting list and assign each patient a time-dependent dynamic priority score. Then, taking the random arrivals of patients into account, sequential decisions are made on a weekly basis. At the end of each week, we select the patients to be treated in the following week from the waiting list. By minimizing the cost function of the MDP over an infinite horizon, we seek to achieve the best trade-off between the patients’ waiting times and the over-utilization of surgical resources. Considering the curses of dimensionality resulting from the large scale of realistically sized problems, we first analyze the structural properties of the MDP and propose an algorithm that facilitates the search for best actions. We then develop a novel reinforcement-learning-based ADP algorithm as the solution technique. Experimental results reveal that the proposed algorithms consume much less computation time in comparison with that required by conventional dynamic programming methods. Additionally, the algorithms are shown to be capable of computing high-quality near-optimal policies for realistically sized problems.
Bibliographical noteFunding Information:
The authors gratefully acknowledge the financial support granted by the China Scholarship Council (CSC, Grant No. 201604490106).
© 2021 Elsevier Ltd
Copyright 2021 Elsevier B.V., All rights reserved.
- Approximate dynamic programming
- Markov decision process
- Operating theatre planning
- Patient admission control
- Recursive least-squares temporal difference learning