Abstract
We consider the task of program synthesis in the presence of a rewardfunction over the output of programs, where the goal is to find programs withmaximal rewards. We employ an iterative optimization scheme, where we train anRNN on a dataset of K best programs from a priority queue of the generatedprograms so far. Then, we synthesize new programs and add them to the priorityqueue by sampling from the RNN. We benchmark our algorithm, called priorityqueue training (or PQT), against genetic algorithm and reinforcement learningbaselines on a simple but expressive Turing complete programming languagecalled BF. Our experimental results show that our simple PQT algorithmsignificantly outperforms the baselines. By adding a program length penalty tothe reward function, we are able to synthesize short, human readable programs.