Abstract
Automated theorem provers have traditionally relied on manually tunedheuristics to guide how they perform proof search. Deep reinforcement learninghas been proposed as a way to obviate the need for such heuristics, however,its deployment in automated theorem proving remains a challenge. In this paperwe introduce TRAIL, a system that applies deep reinforcement learning tosaturation-based theorem proving. TRAIL leverages (a) a novel neuralrepresentation of the state of a theorem prover and (b) a novelcharacterization of the inference selection process in terms of anattention-based action policy. We show through systematic analysis that thesemechanisms allow TRAIL to significantly outperform previousreinforcement-learning-based theorem provers on two benchmark datasets forfirst-order logic automated theorem proving (proving around 15% more theorems).