Abstract
This survey organizes the intricate literature on the design and optimizationof emerging structures around post-trained LMs. We refer to this overarchingstructure as scaffolded LMs and focus on LMs that are integrated intomulti-step processes with tools. We view scaffolded LMs as semi-parametricmodels wherein we train non-parametric variables, including the prompt, tools,and scaffold's code. In particular, they interpret instructions, use tools, andreceive feedback all in language. Recent works use an LM as an optimizer tointerpret language supervision and update non-parametric variables according tointricate objectives. In this survey, we refer to this paradigm as training ofscaffolded LMs with language supervision. A key feature of non-parametrictraining is the ability to learn from language. Parametric training excels inlearning from demonstration (supervised learning), exploration (reinforcementlearning), or observations (unsupervised learning), using well-defined lossfunctions. Language-based optimization enables rich, interpretable, andexpressive objectives, while mitigating issues like catastrophic forgetting andsupporting compatibility with closed-source models. Furthermore, agents areincreasingly deployed as co-workers in real-world applications such as Copilotin Office tools or software development. In these mixed-autonomy settings,where control and decision-making are shared between human and AI, users pointout errors or suggest corrections. Accordingly, we discuss agents thatcontinuously improve by learning from this real-time, language-based feedbackand refer to this setting as streaming learning from language supervision.