Abstract
Recent advancements in robotics and automation have enabled warehouses in the e-commerce era to adopt new ways to stay competitive under highly volatile customer demands with shorter deadlines. Uniquely, we consider an autonomous robot-based order picking system that fulfils orders from a multi-deep gravity flow rack in a dynamic environment, wherein orders arrive continuously. For such a system, we make two decisions: (i) when to pick orders and (ii) which orders compose a batch. We study the online order batching problem with an objective to minimize the weighted earliness and tardiness. While earliness results in increased inventory holding costs, deterioration of perishable goods, or opportunity costs, tardiness is undesired with regard to customer satisfaction. Subsequently, we formulate a Semi-Markov decision process to represent the problem that allows us to create a deep reinforcement learning (DRL) agent. The agent learns a policy by interacting with the environment and solves the problem with Proximal Policy Optimization algorithm. We use several benchmark heuristics to evaluate the performance of the DRL agent. The agent is able to create a policy that produces feasible solutions superior to the benchmark heuristics in most of the tested cases. We demonstrate that the learning agent shows potential performance under fluctuating order environment, which implies that it is effective and efficient, particularly in the online retailing of fast-moving consumer goods.
Original language | English |
---|---|
Publication status | Published - 5 Jul 2022 |
Event | EURO 2022 - AALTO University, Espoo, Finland Duration: 3 Jul 2022 → 6 Jul 2022 https://euro2022espoo.com/ |
Conference
Conference | EURO 2022 |
---|---|
Country/Territory | Finland |
City | Espoo |
Period | 3/07/22 → 6/07/22 |
Internet address |
Keywords
- Warehousing
- E-commerce
- Deep Reinforcement Learning