Voxtral

  • 2025-07-17 16:17:37
  • Alexander H. Liu, Andy Ehrenberg, Andy Lo, Clément Denoix, Corentin Barreau, Guillaume Lample, Jean-Malo Delignon, Khyathi Raghavi Chandu, Patrick von Platen, Pavankumar Reddy Muddireddy, Sanchit Gandhi, Soham Ghosh, Srijan Mishra, Thomas Foubert, Abhinav Rastogi, Adam Yang, Albert Q. Jiang, Alexandre Sablayrolles, Amélie Héliou, Amélie Martin, Anmol Agarwal, Antoine Roux, Arthur Darcet, Arthur Mensch, Baptiste Bout, Baptiste Rozière, Baudouin De Monicault, Chris Bamford, Christian Wallenwein, Christophe Renaudin, Clémence Lanfranchi, Darius Dabert, Devendra Singh Chaplot, Devon Mizelle, Diego de las Casas, Elliot Chane-Sane, Emilien Fugier, Emma Bou Hanna, Gabrielle Berrada, Gauthier Delerce, Gauthier Guinet, Georgii Novikov, Guillaume Martin, Himanshu Jaju, Jan Ludziejewski, Jason Rute,
  • 0

Abstract

We present Voxtral Mini and Voxtral Small, two multimodal audio chat models.Voxtral is trained to comprehend both spoken audio and text documents,achieving state-of-the-art performance across a diverse range of audiobenchmarks, while preserving strong text capabilities. Voxtral Smalloutperforms a number of closed-source models, while being small enough to runlocally. A 32K context window enables the model to handle audio files up to 40minutes in duration and long multi-turn conversations. We also contribute threebenchmarks for evaluating speech understanding models on knowledge and trivia.Both Voxtral models are released under Apache 2.0 license.

 

Quick Read (beta)

loading the full paper ...