Program Synthesis Through Reinforcement Learning Guided Tree Search

Abstract

Program Synthesis is the task of generating a program from a providedspecification. Traditionally, this has been treated as a search problem by theprogramming languages (PL) community and more recently as a supervised learningproblem by the machine learning community. Here, we propose a third approach,representing the task of synthesizing a given program as a Markov decisionprocess solvable via reinforcement learning(RL). From observations about thestates of partial programs, we attempt to find a program that is optimal over aprovided reward metric on pairs of programs and states. We instantiate thisapproach on a subset of the RISC-V assembly language operating on floatingpoint numbers, and as an optimization inspired by search-based techniques fromthe PL community, we combine RL with a priority search tree. We evaluate thisinstantiation and demonstrate the effectiveness of our combined method comparedto a variety of baselines, including a pure RL ablation and a state of the artMarkov chain Monte Carlo search method on this task.

Quick Read (beta)

loading the full paper ...