Ensemble-based Deep Reinforcement Learning for Vehicle Routing Problems under Distribution Shift

  • Yuan Jiang
  • , Zhiguang Cao
  • , Yaoxin Wu
  • , Wen Song
  • , Jie Zhang

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

23 Downloads (Pure)

Abstract

While performing favourably on the independent and identically distributed (i.i.d.) instances, most of the existing neural methods for vehicle routing problems (VRPs) struggle to generalize in the presence of a distribution shift. To tackle this issue, we propose an ensemble-based deep reinforcement learning method for VRPs, which learns a group of diverse sub-policies to cope with various instance distributions. In particular, to prevent convergence of the parameters to the same one, we enforce diversity across sub-policies by leveraging Bootstrap with random initialization. Moreover, we also explicitly pursue inequality between sub-policies by exploiting regularization terms during training to further enhance diversity. Experimental results show that our method is able to outperform the state-of-the-art neural baselines on randomly generated instances of various distributions, and also generalizes favourably on the benchmark instances from TSPLib and CVRPLib, which confirmed the effectiveness of the whole method and the respective designs.
Original languageEnglish
Title of host publicationAdvances in Neural Information Processing Systems 36 (NeurIPS 2023)
EditorsA. Oh, T. Neumann, A. Globerson, K. Saenko, M. Hardt, S. Levine
Number of pages14
Publication statusPublished - 2023
Event37th Conference on Neural Information Processing Systems, NeurIPS 2023 - New Orleans, United States
Duration: 10 Dec 202316 Dec 2023
Conference number: 37

Conference

Conference37th Conference on Neural Information Processing Systems, NeurIPS 2023
Abbreviated titleNeurIPS 2023
Country/TerritoryUnited States
CityNew Orleans
Period10/12/2316/12/23

Fingerprint

Dive into the research topics of 'Ensemble-based Deep Reinforcement Learning for Vehicle Routing Problems under Distribution Shift'. Together they form a unique fingerprint.

Cite this