Abstract
Enabling neural networks to learn complex logical constraints and fulfillsymbolic reasoning is a critical challenge. Bridging this gap often requiresguiding the neural network's output distribution to move closer to the symbolicconstraints. While diffusion models have shown remarkable generative capabilityacross various domains, we employ the powerful architecture to performneuro-symbolic learning and solve logical puzzles. Our diffusion-based pipelineadopts a two-stage training strategy: the first stage focuses on cultivatingbasic reasoning abilities, while the second emphasizes systematic learning oflogical constraints. To impose hard constraints on neural outputs in the secondstage, we formulate the diffusion reasoner as a Markov decision process andinnovatively fine-tune it with an improved proximal policy optimizationalgorithm. We utilize a rule-based reward signal derived from the logicalconsistency of neural outputs and adopt a flexible strategy to optimize thediffusion reasoner's policy. We evaluate our methodology on some classicalsymbolic reasoning benchmarks, including Sudoku, Maze, pathfinding andpreference learning. Experimental results demonstrate that our approachachieves outstanding accuracy and logical consistency among neural networks.