LLM Post-Training: A Deep Dive into Reasoning Large Language Models

Abstract

Large Language Models (LLMs) have transformed the natural language processinglandscape and brought to life diverse applications. Pretraining on vastweb-scale data has laid the foundation for these models, yet the researchcommunity is now increasingly shifting focus toward post-training techniques toachieve further breakthroughs. While pretraining provides a broad linguisticfoundation, post-training methods enable LLMs to refine their knowledge,improve reasoning, enhance factual accuracy, and align more effectively withuser intents and ethical considerations. Fine-tuning, reinforcement learning,and test-time scaling have emerged as critical strategies for optimizing LLMsperformance, ensuring robustness, and improving adaptability across variousreal-world tasks. This survey provides a systematic exploration ofpost-training methodologies, analyzing their role in refining LLMs beyondpretraining, addressing key challenges such as catastrophic forgetting, rewardhacking, and inference-time trade-offs. We highlight emerging directions inmodel alignment, scalable adaptation, and inference-time reasoning, and outlinefuture research directions. We also provide a public repository to continuallytrack developments in this fast-evolving field:https://github.com/mbzuai-oryx/Awesome-LLM-Post-training.

Quick Read (beta)

loading the full paper ...