Reparameterized LLM Training via Orthogonal Equivalence Transformation

  • 2025-06-09 18:59:34
  • Zeju Qiu, Simon Buchholz, Tim Z. Xiao, Maximilian Dax, Bernhard Schölkopf, Weiyang Liu
  • 0

Abstract

While large language models (LLMs) are driving the rapid advancement ofartificial intelligence, effectively and reliably training these large modelsremains one of the field's most significant challenges. To address thischallenge, we propose POET, a novel reParameterized training algorithm thatuses Orthogonal Equivalence Transformation to optimize neurons. Specifically,POET reparameterizes each neuron with two learnable orthogonal matrices and afixed random weight matrix. Because of its provable preservation of spectralproperties of weight matrices, POET can stably optimize the objective functionwith improved generalization. We further develop efficient approximations thatmake POET flexible and scalable for training large-scale neural networks.Extensive experiments validate the effectiveness and scalability of POET intraining LLMs.

 

Quick Read (beta)

loading the full paper ...