Deep Reinforcement Learning for a Multi-Objective Online Order Batching Problem

Martijn Beeks, Reza Refaei Afshar, Yingqian Zhang, Remco Dijkman, Claudy van Dorst, Stijn de Looijer

Onderzoeksoutput: Hoofdstuk in Boek/Rapport/CongresprocedureConferentiebijdrageAcademicpeer review

7 Citaten (Scopus)
531 Downloads (Pure)

Samenvatting

On-time delivery and low service costs are two important performance metrics in warehousing operations. This paper proposes a Deep Reinforcement Learning (DRL) based approach to solve the online Order Batching and Sequence Problem (OBSP) to optimize these two objectives. To learn how to balance the trade-off between two objectives, we introduce a Bayesian optimization framework to shape the reward function of the DRL agent, such that the influences of learning to these objectives are adjusted to different environments. We compare our approach with several heuristics using problem instances of real-world size where thousands of orders arrive dynamically per hour. We show the Proximal Policy Optimization (PPO) algorithm with Bayesian optimization outperforms the heuristics in all tested scenarios on both objectives. In addition, it finds different weights for the components in the reward function in different scenarios, indicating its capability of learning how to set the importance of two objectives under different environments. We also provide policy analysis on the learned DRL agent, where a decision tree is used to infer decision rules to enable the interpretability of the DRL approach.
Originele taal-2Engels
TitelProceedings of the 32nd International Conference on Automated Planning and Scheduling, ICAPS 2022
RedacteurenAkshat Kumar, Sylvie Thiebaux, Pradeep Varakantham, William Yeoh
UitgeverijAAAI Press
Pagina's435-443
Aantal pagina's9
ISBN van elektronische versie9781577358749
DOI's
StatusGepubliceerd - 13 jun. 2022
Evenement32th International Conference on Automated Planning and Scheduling, ICAPS 20222 - Virtual, Singapore, Singapore
Duur: 13 jun. 202224 jun. 2022
Congresnummer: 32
http://icaps22.icaps-conference.org/

Congres

Congres32th International Conference on Automated Planning and Scheduling, ICAPS 20222
Verkorte titelICAPS
Land/RegioSingapore
StadSingapore
Periode13/06/2224/06/22
Internet adres

Vingerafdruk

Duik in de onderzoeksthema's van 'Deep Reinforcement Learning for a Multi-Objective Online Order Batching Problem'. Samen vormen ze een unieke vingerafdruk.

Citeer dit