Samenvatting
The application of Deep Reinforcement Learning (DRL) to inventory management is an emerging field. However, traditional DRL algorithms, originally developed for diverse domains such as game-playing and robotics, may not be well-suited for the specific challenges posed by inventory management. Consequently, these algorithms often fail to outperform established heuristics; for instance, no existing DRL approach consistently surpasses the capped base-stock policy in lost sales inventory control. This highlights a critical gap in the practical application of DRL to inventory management: the highly stochastic nature of inventory problems requires tailored solutions. In response, we propose Deep Controlled Learning (DCL), a new DRL algorithm designed for highly stochastic problems. DCL is based on approximate policy iteration and incorporates an efficient simulation mechanism, combining Sequential Halving with Common Random Numbers. Our numerical studies demonstrate that DCL consistently outperforms state-of-the-art heuristics and DRL algorithms across various inventory settings, including lost sales, perishable inventory systems, and inventory systems with random lead times. DCL achieves lower average costs in all test cases while maintaining an optimality gap of no more than 0.2%. Remarkably, this performance is achieved using the same hyperparameter set across all experiments, underscoring the robustness and generalizability of our approach. These findings contribute to the ongoing exploration of tailored DRL algorithms for inventory management, providing a foundation for further research and practical application in this area.
| Originele taal-2 | Engels |
|---|---|
| Pagina's (van-tot) | 104-117 |
| Aantal pagina's | 14 |
| Tijdschrift | European Journal of Operational Research |
| Volume | 324 |
| Nummer van het tijdschrift | 1 |
| DOI's | |
| Status | Gepubliceerd - 1 jul. 2025 |
Financiering
The authors thank the editor and the three anonymous reviewers for their valuable comments and suggestions, which have significantly improved this paper. Tarkan Temizöz conducted his research in the project DynaPlex: Deep Reinforcement Learning for Data-Driven Logistics, made possible by TKI Dinalog and the Topsector Logistics and funded by the Ministry of Economic Affairs and Climate Policy. We acknowledge the support of the SURF Cooperative using grant no. EINF-5192.