Learning 2-Opt Heuristics for Routing Problems via Deep Reinforcement Learning

Onderzoeksoutput: Bijdrage aan tijdschriftTijdschriftartikelAcademicpeer review

5 Downloads (Pure)


Recent works using deep learning to solve routing problems such as the traveling salesman problem (TSP) have focused on learning construction heuristics. Such approaches find good quality solutions but require additional procedures such as beam search and sampling to improve solutions and achieve state-of-the-art performance. However, few studies have focused on improvement heuristics, where a given solution is improved until reaching a near-optimal one. In this work, we propose to learn a local search heuristic based on 2-opt operators via deep reinforcement learning. We propose a policy gradient algorithm to learn a stochastic policy that selects 2-opt operations given a current solution. Moreover, we introduce a policy neural network that leverages a pointing attention mechanism, which can be easily extended to more general k-opt moves. Our results show that the learned policies can improve even over random initial solutions and approach near-optimal solutions faster than previous state-of-the-art deep learning methods for the TSP. We also show we can adapt the proposed method to two extensions of the TSP: the multiple TSP and the Vehicle Routing Problem, achieving results on par with classical heuristics and learned methods.
Originele taal-2Engels
Aantal pagina's16
TijdschriftSN Computer Science
StatusGepubliceerd - 23 jul 2021


Duik in de onderzoeksthema's van 'Learning 2-Opt Heuristics for Routing Problems via Deep Reinforcement Learning'. Samen vormen ze een unieke vingerafdruk.

Citeer dit