Abstract
Sign language translation has historically been peripheral to mainstreammachine translation research. In order to help converge the fields, weintroduce FLEURS-ASL, an extension of the multiway parallel benchmarks FLORES(for text) and FLEURS (for speech) to support their first sign language (asvideo), American Sign Language, translated by 5 Certified Deaf Interpreters.FLEURS-ASL can be used to evaluate a variety of tasks -- primarily sentence-and discourse-level translation -- between ASL and 200 other languages as text,or 102 languages as speech. We provide baselines for tasks from ASL to Englishtext using a unified modeling approach that incorporates timestamp tokens andprevious text tokens in a 34-second context window, trained on random videoclips from YouTube-ASL. This model meets or exceeds the performance ofphrase-level baselines while supporting a multitude of new tasks. We also useFLEURS-ASL to show that multimodal frontier models have virtually nounderstanding of ASL, underscoring the importance of including sign languagesin standard evaluation suites.