Abstract
Beyond neural scaling laws, little is known about the laws underlying largelanguage models (LLMs). We introduce Neural Thermodynamic Laws (NTL) -- a newframework that offers fresh insights into LLM training dynamics. On thetheoretical side, we demonstrate that key thermodynamic quantities (e.g.,temperature, entropy, heat capacity, thermal conduction) and classicalthermodynamic principles (e.g., the three laws of thermodynamics and theequipartition theorem) naturally emerge under river-valley loss landscapeassumptions. On the practical side, this scientific perspective yieldsintuitive guidelines for designing learning rate schedules.
Quick Read (beta)
loading the full paper ...