LLM-JEPA: Large Language Models Meet Joint Embedding Predictive Architectures

Abstract

Large Language Model (LLM) pretraining, finetuning, and evaluation rely oninput-space reconstruction and generative capabilities. Yet, it has beenobserved in vision that embedding-space training objectives, e.g., with JointEmbedding Predictive Architectures (JEPAs), are far superior to theirinput-space counterpart. That mismatch in how training is achieved betweenlanguage and vision opens up a natural question: {\em can language trainingmethods learn a few tricks from the vision ones?} The lack of JEPA-style LLM isa testimony of the challenge in designing such objectives for language. In thiswork, we propose a first step in that direction where we develop LLM-JEPA, aJEPA based solution for LLMs applicable both to finetuning and pretraining.Thus far, LLM-JEPA is able to outperform the standard LLM training objectivesby a significant margin across models, all while being robust to overfiting.Those findings are observed across numerous datasets (NL-RX, GSM8K, Spider,RottenTomatoes) and various models from the Llama3, OpenELM, Gemma2 and Olmofamilies. Code: https://github.com/rbalestr-lab/llm-jepa.

Quick Read (beta)

loading the full paper ...