Abstract
Large Language Models (LLMs) excel at multi-step reasoning problems withexplicit chain-of-thought (CoT), but verbose traces incur significantcomputational costs and memory overhead, and often carry redundant, stylisticartifacts. Latent reasoning has emerged as an efficient alternative thatinternalizes the thought process, but it suffers from a critical lack ofsupervision, limiting its effectiveness on complex, natural-language reasoningtraces. In this work, we propose KaVa, the first framework that bridges thisgap by distilling knowledge directly from a compressed KV-cache of the teacherinto a latent-reasoning student via self-distillation, leveraging therepresentational flexibility of continuous latent tokens to align stepwise KVtrajectories. We show that the abstract, unstructured knowledge withincompressed KV-cache, which lacks direct token correspondence, can serve as arich supervisory signal for a latent reasoning student. Empirically, theapproach consistently outperforms strong latent baselines, exhibits markedlysmaller degradation from equation-only to natural-language traces, and scalesto larger backbones while preserving efficiency. These results establishcompressed KV-cache distillation as a scalable supervision signal for latentreasoning, combining the accuracy of CoT-trained teachers with the efficiencyand deployability of latent inference.