Recent advancements in robotics and automation have enabled warehouses in the e-commerce era to adopt new ways to stay competitive under highly volatile customer demands with shorter deadlines. Uniquely, we consider an autonomous robot-based order picking system that fulfils orders from a multi-deep gravity flow rack in a dynamic environment, wherein orders arrive continuously. For such a system, we make two decisions: (i) when to pick orders and (ii) which orders compose a batch. We study the online order batching problem with an objective to minimize the weighted earliness and tardiness. While earliness results in increased inventory holding costs, deterioration of perishable goods, or opportunity costs, tardiness is undesired with regard to customer satisfaction. Subsequently, we formulate a Semi-Markov decision process to represent the problem that allows us to create a deep reinforcement learning (DRL) agent. The agent learns a policy by interacting with the environment and solves the problem with Proximal Policy Optimization algorithm. We use several benchmark heuristics to evaluate the performance of the DRL agent. The agent is able to create a policy that produces feasible solutions superior to the benchmark heuristics in most of the tested cases. We demonstrate that the learning agent shows potential performance under fluctuating order environment, which implies that it is effective and efficient, particularly in the online retailing of fast-moving consumer goods.
|Publication status||Published - 5 Jul 2022|
|Event||EURO 2022 - AALTO University, Espoo, Finland|
Duration: 3 Jul 2022 → 6 Jul 2022
|Period||3/07/22 → 6/07/22|
- Deep Reinforcement Learning