Learning 2-opt Local Search from Heuristics as Expert Demonstrations

Onderzoeksoutput: Hoofdstuk in Boek/Rapport/CongresprocedureConferentiebijdrageAcademicpeer review

1 Citaat (Scopus)
203 Downloads (Pure)

Samenvatting

Deep Reinforcement Learning (RL) has achieved high success in solving routing problems. However, state-of-the-art deep RL approaches require a considerable amount of data before they reach reasonable performance. This may be acceptable for small problems, but as instances grow bigger, this fact severely limits the applicability of these methods to many real-world instances. In this work, we study a setting where the agent can access data from previously handcrafted heuristics for the Traveling Salesman Problem. In our setting, the agent has access to demonstrations from 2-opt improvement policies. Our goal is to learn policies that can surpass the quality of the demonstrations while requiring fewer samples than pure RL. In this study, we propose to first learn policies with Imitation Learning (IL), leveraging a small set of demonstration data to accelerate policy learning. Afterward, we combine on policy and value approximation updates to improve performance over the expert's performance. We show that our method learns good policies in a shorter time and using less data than classical policy gradient, which does not incorporate demonstration data into RL. Moreover, in terms of solution quality, it performs similarly to other state-of-the-art deep RL approaches.
Originele taal-2Engels
Titel 2021 International Joint Conference on Neural Networks (IJCNN)
UitgeverijInstitute of Electrical and Electronics Engineers
Aantal pagina's8
ISBN van elektronische versie978-1-6654-3900-8
DOI's
StatusGepubliceerd - 20 sep. 2021
Evenement2021 International Joint Conference on Neural Networks, IJCNN 2021 - Shenzhen, China
Duur: 18 jul. 202122 jul. 2021

Congres

Congres2021 International Joint Conference on Neural Networks, IJCNN 2021
Verkorte titelIJCNN 2021
Land/RegioChina
StadShenzhen
Periode18/07/2122/07/21

Vingerafdruk

Duik in de onderzoeksthema's van 'Learning 2-opt Local Search from Heuristics as Expert Demonstrations'. Samen vormen ze een unieke vingerafdruk.

Citeer dit