TY - JOUR
T1 - Capacity planning in logistics corridors
T2 - Deep reinforcement learning for the dynamic stochastic temporal bin packing problem
AU - Farahani, Amirreza
AU - Genga, Laura
AU - Schrotenboer, Albert H.
AU - Dijkman, Remco
N1 - Publisher Copyright:
© 2024 The Author(s)
PY - 2024/11
Y1 - 2024/11
N2 - This paper addresses the challenge of managing uncertainty in the daily capacity planning of a terminal in a corridor-based logistics system. Corridor-based logistics systems facilitate the exchange of freight between two distinct regions, usually involving industrial and logistics clusters. In this context, we introduce the dynamic stochastic temporal bin packing problem. It models the assignment of individual containers to carriers’ trucks over discrete time units in real-time. We formulate it as a Markov decision process (MDP). Two distinguishing characteristics of our problem are the stochastic nature of the time-dependent availability of containers, i.e., container delays, and the continuous-time, or dynamic, aspect of the planning, where a container announcement may occur at any time moment during the planning horizon. We introduce an innovative real-time planning algorithm based on Proximal Policy Optimization (PPO), a Deep Reinforcement Learning (DRL) method, to allocate individual containers to eligible carriers in real-time. In addition, we propose some practical heuristics and two novel rolling-horizon batch-planning methods based on (stochastic) mixed-integer programming (MIP), which can be interpreted as computational information relaxation bounds because they delay decision making. The results show that our proposed DRL method outperforms the practical heuristics and effectively scales to larger-sized problems as opposed to the stochastic MIP-based approach, making our DRL method a practically appealing solution.
AB - This paper addresses the challenge of managing uncertainty in the daily capacity planning of a terminal in a corridor-based logistics system. Corridor-based logistics systems facilitate the exchange of freight between two distinct regions, usually involving industrial and logistics clusters. In this context, we introduce the dynamic stochastic temporal bin packing problem. It models the assignment of individual containers to carriers’ trucks over discrete time units in real-time. We formulate it as a Markov decision process (MDP). Two distinguishing characteristics of our problem are the stochastic nature of the time-dependent availability of containers, i.e., container delays, and the continuous-time, or dynamic, aspect of the planning, where a container announcement may occur at any time moment during the planning horizon. We introduce an innovative real-time planning algorithm based on Proximal Policy Optimization (PPO), a Deep Reinforcement Learning (DRL) method, to allocate individual containers to eligible carriers in real-time. In addition, we propose some practical heuristics and two novel rolling-horizon batch-planning methods based on (stochastic) mixed-integer programming (MIP), which can be interpreted as computational information relaxation bounds because they delay decision making. The results show that our proposed DRL method outperforms the practical heuristics and effectively scales to larger-sized problems as opposed to the stochastic MIP-based approach, making our DRL method a practically appealing solution.
KW - Bin packing
KW - Continuous-time planning
KW - Deep reinforcement learning
KW - Logistics
KW - Logistics corridors
KW - Real-time planning
KW - Stochastic programming
KW - Transportation
UR - http://www.scopus.com/inward/record.url?scp=85202557882&partnerID=8YFLogxK
U2 - 10.1016/j.tre.2024.103742
DO - 10.1016/j.tre.2024.103742
M3 - Article
AN - SCOPUS:85202557882
SN - 1366-5545
VL - 191
JO - Transportation Research Part E: Logistics and Transportation Review
JF - Transportation Research Part E: Logistics and Transportation Review
M1 - 103742
ER -