Abstract
Modern artificial intelligence (AI) systems are powered by foundation models.This paper presents a new set of foundation models, called Llama 3. It is aherd of language models that natively support multilinguality, coding,reasoning, and tool usage. Our largest model is a dense Transformer with 405Bparameters and a context window of up to 128K tokens. This paper presents anextensive empirical evaluation of Llama 3. We find that Llama 3 deliverscomparable quality to leading language models such as GPT-4 on a plethora oftasks. We publicly release Llama 3, including pre-trained and post-trainedversions of the 405B parameter language model and our Llama Guard 3 model forinput and output safety. The paper also presents the results of experiments inwhich we integrate image, video, and speech capabilities into Llama 3 via acompositional approach. We observe this approach performs competitively withthe state-of-the-art on image, video, and speech recognition tasks. Theresulting models are not yet being broadly released as they are still underdevelopment.