Learning 2-opt Heuristics for the Traveling Salesman Problem via Deep Reinforcement Learning

Abstract

Recent works using deep learning to solve the Traveling Salesman Problem(TSP) have focused on learning construction heuristics. Such approaches findTSP solutions of good quality but require additional procedures such as beamsearch and sampling to improve solutions and achieve state-of-the-artperformance. However, few studies have focused on improvement heuristics, wherea given solution is improved until reaching a near-optimal one. In this work,we propose to learn a local search heuristic based on 2-opt operators via deepreinforcement learning. We propose a policy gradient algorithm to learn astochastic policy that selects 2-opt operations given a current solution.Moreover, we introduce a policy neural network that leverages a pointingattention mechanism, which unlike previous works, can be easily extended tomore general k-opt moves. Our results show that the learned policies canimprove even over random initial solutions and approach near-optimal solutionsat a faster rate than previous state-of-the-art deep learning methods.

Quick Read (beta)

loading the full paper ...