TY - GEN
T1 - Deep Reinforcement Learning for Two-Sided Online Bipartite Matching in Collaborative Order Picking
AU - Begnardi, Luca
AU - Baier, Hendrik
AU - van Jaarsveld, Willem L.
AU - Zhang, Yingqian
PY - 2024
Y1 - 2024
N2 - As a growing number of warehouse operators are moving from human-only to Collaborative human-robot Order Picking solutions, more efficient picker routing policies are needed, since the complexity of coordinating multiple actors in the system increases significantly. The objective of these policies is to match human pickers and robot carriers to fulfill picking tasks, optimizing pick-rate and total tardiness of the orders. In this paper, we propose to formulate the order picking routing problem as a more general combinatorial optimization problem known as Two-sided Online Bipartite Matching. We present an end-to-end Deep Reinforcement Learning approach to optimize a combination of pick-rate and order tardiness, and to deal with the uncertainty of real-world warehouse environments. To extract and exploit spatial information from the environment, we devise three different Graph Neural Network architectures and empirically evaluate them on several scenarios of growing complexity in a simulation environment we developed. We show that all proposed methods significantly outperform greedy and more sophisticated heuristics, as well as non-GNN-based DRL approaches. Moreover, our methods exhibit good transferability properties, even when scaling up test problem instances to more than forty times the size of the ones the models were trained on.
AB - As a growing number of warehouse operators are moving from human-only to Collaborative human-robot Order Picking solutions, more efficient picker routing policies are needed, since the complexity of coordinating multiple actors in the system increases significantly. The objective of these policies is to match human pickers and robot carriers to fulfill picking tasks, optimizing pick-rate and total tardiness of the orders. In this paper, we propose to formulate the order picking routing problem as a more general combinatorial optimization problem known as Two-sided Online Bipartite Matching. We present an end-to-end Deep Reinforcement Learning approach to optimize a combination of pick-rate and order tardiness, and to deal with the uncertainty of real-world warehouse environments. To extract and exploit spatial information from the environment, we devise three different Graph Neural Network architectures and empirically evaluate them on several scenarios of growing complexity in a simulation environment we developed. We show that all proposed methods significantly outperform greedy and more sophisticated heuristics, as well as non-GNN-based DRL approaches. Moreover, our methods exhibit good transferability properties, even when scaling up test problem instances to more than forty times the size of the ones the models were trained on.
KW - Deep Reinforcement Learning
KW - Graph Neural Networks
KW - Collaborative Order Picking
KW - Online Bipartite Matching
KW - Online Combinatorial Optimization
M3 - Conference contribution
T3 - Proceedings of Machine Learning Research (PMLR)
SP - 121
EP - 136
BT - Proceedings of the 15th Asian Conference on Machine Learning, ACML2023
A2 - Yanıkoğlu, Berrin
A2 - Buntine, Wray
PB - PMLR
T2 - 15th Asian Conference on Machine Learning
Y2 - 11 November 2023 through 14 November 2023
ER -