Doorgaan naar hoofdnavigatie Doorgaan naar zoeken Ga verder naar hoofdinhoud

Learning 2-opt Heuristics for the Traveling Salesman Problem via Deep Reinforcement Learning

  • Paulo Roberto de O. da Costa
  • , Jason Rhuggenaath
  • , Yingqian Zhang
  • , Alp Akcay

Onderzoeksoutput: Hoofdstuk in Boek/Rapport/CongresprocedureConferentiebijdrageAcademicpeer review

100 Downloads (Pure)

Samenvatting

Recent works using deep learning to solve the Traveling Salesman Problem (TSP) have focused on learning construction heuristics. Such approaches find TSP solutions of good quality but require additional procedures such as beam search and sampling to improve solutions and achieve state-of-the-art performance. However, few studies have focused on improvement heuristics, where a given solution is improved until reaching a near-optimal one. In this work, we propose to learn a local search heuristic based on 2-opt operators via deep reinforcement learning. We propose a policy gradient algorithm to learn a stochastic policy that selects 2-opt operations given a current solution. Moreover, we introduce a policy neural network that leverages a pointing attention mechanism, which unlike previous works, can be easily extended to more general k-opt moves. Our results show that the learned policies can improve even over random initial solutions and approach near-optimal solutions at a faster rate than previous state-of-the-art deep learning methods.

Originele taal-2Engels
TitelAsian Conference on Machine Learning, 18-20 November 2020, Bangkok, Thailand
UitgeverijPMLR
Pagina's465-480
Aantal pagina's16
StatusGepubliceerd - 2020
Evenement12th Asian Conference on Machine Learning (virtual) - Bangkok, Thailand
Duur: 18 nov. 202020 nov. 2020

Publicatie series

NaamProceedings of Machine Learning Research
Volume129
ISSN van geprinte versie2640-3498

Congres

Congres12th Asian Conference on Machine Learning (virtual)
Verkorte titelACML2020
Land/RegioThailand
StadBangkok
Periode18/11/2020/11/20

Vingerafdruk

Duik in de onderzoeksthema's van 'Learning 2-opt Heuristics for the Traveling Salesman Problem via Deep Reinforcement Learning'. Samen vormen ze een unieke vingerafdruk.

Citeer dit