Physics of Skill Learning

Abstract

We aim to understand physics of skill learning, i.e., how skills are learnedin neural networks during training. We start by observing the Domino effect,i.e., skills are learned sequentially, and notably, some skills kick offlearning right after others complete learning, similar to the sequential fallof domino cards. To understand the Domino effect and relevant behaviors ofskill learning, we take physicists' approach of abstraction and simplification.We propose three models with varying complexities -- the Geometry model, theResource model, and the Domino model, trading between reality and simplicity.The Domino effect can be reproduced in the Geometry model, whose resourceinterpretation inspires the Resource model, which can be further simplified tothe Domino model. These models present different levels of abstraction andsimplification; each is useful to study some aspects of skill learning. TheGeometry model provides interesting insights into neural scaling laws andoptimizers; the Resource model sheds light on the learning dynamics ofcompositional tasks; the Domino model reveals the benefits of modularity. Thesemodels are not only conceptually interesting -- e.g., we show how Chinchillascaling laws can emerge from the Geometry model, but also are useful inpractice by inspiring algorithmic development -- e.g., we show how simplealgorithmic changes, motivated by these toy models, can speed up the trainingof deep learning models.

Quick Read (beta)

loading the full paper ...