The Llama 3 Herd of Models

  • 2024-07-31 18:54:27
  • Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Amy Yang, Angela Fan, Anirudh Goyal, Anthony Hartshorn, Aobo Yang, Archi Mitra, Archie Sravankumar, Artem Korenev, Arthur Hinsvark, Arun Rao, Aston Zhang, Aurelien Rodriguez, Austen Gregerson, Ava Spataru, Baptiste Roziere, Bethany Biron, Binh Tang, Bobbie Chern, Charlotte Caucheteux, Chaya Nayak, Chloe Bi, Chris Marra, Chris McConnell, Christian Keller, Christophe Touret, Chunyang Wu, Corinne Wong, Cristian Canton Ferrer, Cyrus Nikolaidis, Damien Allonsius, Daniel Song, Danielle Pintz, Danny Livshits, David Esiobu, Dhruv Choudhary, Dhruv Mahajan, Diego Garcia-Olano, Diego Perino, Dieuwke Hupkes, Egor Lakomkin, Ehab AlBadawy, Elina Lobanova, Emily Dinan, Eric Michae
  • 0

Abstract

Modern artificial intelligence (AI) systems are powered by foundation models.This paper presents a new set of foundation models, called Llama 3. It is aherd of language models that natively support multilinguality, coding,reasoning, and tool usage. Our largest model is a dense Transformer with 405Bparameters and a context window of up to 128K tokens. This paper presents anextensive empirical evaluation of Llama 3. We find that Llama 3 deliverscomparable quality to leading language models such as GPT-4 on a plethora oftasks. We publicly release Llama 3, including pre-trained and post-trainedversions of the 405B parameter language model and our Llama Guard 3 model forinput and output safety. The paper also presents the results of experiments inwhich we integrate image, video, and speech capabilities into Llama 3 via acompositional approach. We observe this approach performs competitively withthe state-of-the-art on image, video, and speech recognition tasks. Theresulting models are not yet being broadly released as they are still underdevelopment.

 

Quick Read (beta)

loading the full paper ...