Samenvatting
Deep Reinforcement Learning (RL) has achieved high success in solving routing problems. However, state-of-the-art deep RL approaches require a considerable amount of data before they reach reasonable performance. This may be acceptable for small problems, but as instances grow bigger, this fact severely limits the applicability of these methods to many real-world instances. In this work, we study a setting where the agent can access data from previously handcrafted heuristics for the Traveling Salesman Problem. In our setting, the agent has access to demonstrations from 2-opt improvement policies. Our goal is to learn policies that can surpass the quality of the demonstrations while requiring fewer samples than pure RL. In this study, we propose to first learn policies with Imitation Learning (IL), leveraging a small set of demonstration data to accelerate policy learning. Afterward, we combine on policy and value approximation updates to improve performance over the expert's performance. We show that our method learns good policies in a shorter time and using less data than classical policy gradient, which does not incorporate demonstration data into RL. Moreover, in terms of solution quality, it performs similarly to other state-of-the-art deep RL approaches.
Originele taal-2 | Engels |
---|---|
Titel | 2021 International Joint Conference on Neural Networks (IJCNN) |
Uitgeverij | Institute of Electrical and Electronics Engineers |
Aantal pagina's | 8 |
ISBN van elektronische versie | 978-1-6654-3900-8 |
DOI's | |
Status | Gepubliceerd - 20 sep. 2021 |
Evenement | 2021 International Joint Conference on Neural Networks, IJCNN 2021 - Shenzhen, China Duur: 18 jul. 2021 → 22 jul. 2021 |
Congres
Congres | 2021 International Joint Conference on Neural Networks, IJCNN 2021 |
---|---|
Verkorte titel | IJCNN 2021 |
Land/Regio | China |
Stad | Shenzhen |
Periode | 18/07/21 → 22/07/21 |