Time-Constrained Multi-Agent Path Finding in Non-Lattice Graphs with Deep Reinforcement Learning

Marijn S. van Knippenberg, Mike J. Holenderski, V. Menkovski

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

2 Citations (Scopus)

Abstract

Multi-Agent Path Finding (MAPF) is a routing problem in which multiple agents need to each find a lowest-cost collection of routes in a graph that avoids collisions between agents. This problem occurs frequently in the domain of logistics, for example in the routing of trains in shunting yards, airplanes at airports, and picking robots in automated warehouses. A solution is presented for the MAPF problem in which agents operate on an arbitrary directed graph, rather than the commonly assumed grid world, which extends support to use cases where the environment cannot be easily modeled in a grid shape. Furthermore, constraints are introduced on the start and end times of the routing tasks, which is vital in MAPF problems that are part of larger logistics systems. A Reinforcement Learning-based (RL) approach is proposed to learn a local routing policy for an agent in a manner that relieves the need for manually designing heuristics. It relies on a Graph Convolutional Network to handle arbitrary graphs. Both single-agent and multi-agent RL approaches are presented, showing how a multi-agent setup can reduce training time by exploiting the similarities in agent properties and local graph topologies.
Original languageEnglish
Title of host publicationAsian Conference on Machine Learning
Pages1317-1332
Number of pages16
Publication statusPublished - 30 Nov 2021

Publication series

NameProceedings of Machine Learning Research
Volume157

Keywords

  • Deep Reinforcement Learning
  • Graph Neural Networks
  • Multi-Agent Path Finding
  • Multi-Agent Reinforcement Learning
  • Optimization
  • Routing

Fingerprint

Dive into the research topics of 'Time-Constrained Multi-Agent Path Finding in Non-Lattice Graphs with Deep Reinforcement Learning'. Together they form a unique fingerprint.

Cite this