Grokking Modular Polynomials

  • 2024-06-05 18:59:35
  • Darshil Doshi, Tianyu He, Aritra Das, Andrey Gromov
Neural networks readily learn a subset of the modular arithmetic tasks, whilefailing to generalize on the rest. This limitation remains unmoved by thechoice of architecture and training strategies. On the other hand, ananalytical solution for the weights of Multi-layer Perceptron (MLP) networksthat generalize on the modular addition task is known in the literature. Inthis work, we (i) extend the class of analytical solutions to include modularmultiplication as well as modular addition with many terms. Additionally, weshow that real networks trained on these datasets learn similar solutions upongeneralization (grokking). (ii) We combine these "expert" solutions toconstruct networks that generalize on arbitrary modular polynomials. (iii) Wehypothesize a classification of modular polynomials into learnable andnon-learnable via neural networks training; and provide experimental evidencesupporting our claims.


