Discovering Hidden Algebraic Structures via Transformers with Rank-Aware Beam GRPO

  • 2025-08-21 17:58:50
  • Jaeha Lee, Gio Huh, Ning Su, Tony Yue YU
  • 0

Abstract

Recent efforts have extended the capabilities of transformers in logicalreasoning and symbolic computations. In this work, we investigate theircapacity for non-linear latent pattern discovery in the context of functionaldecomposition, focusing on the challenging algebraic task of multivariatepolynomial decomposition. This problem, with widespread applications in scienceand engineering, is proved to be NP-hard, and demands both precision andinsight. Our contributions are threefold: First, we develop a synthetic datageneration pipeline providing fine-grained control over problem complexity.Second, we train transformer models via supervised learning and evaluate themacross four key dimensions involving scaling behavior and generalizability.Third, we propose Beam Grouped Relative Policy Optimization (BGRPO), arank-aware reinforcement learning method suitable for hard algebraic problems.Finetuning with BGRPO improves accuracy while reducing beam width by up tohalf, resulting in approximately 75% lower inference compute. Additionally, ourmodel demonstrates competitive performance in polynomial simplification,outperforming Mathematica in various cases.

 

Quick Read (beta)

loading the full paper ...