Spoken language translation has recently witnessed a resurgence inpopularity, thanks to the development of end-to-end models and the creation ofnew corpora, such as Augmented LibriSpeech and MuST-C. Existing datasetsinvolve language pairs with English as a source language, involve very specificdomains or are low resource. We introduce CoVoST, a multilingual speech-to-texttranslation corpus from 11 languages into English, diversified with over 11,000speakers and over 60 accents. We describe the dataset creation methodology andprovide empirical evidence of the quality of the data. We also provide initialbenchmarks, including, to our knowledge, the first end-to-end many-to-onemultilingual models for spoken language translation. CoVoST is released underCC0 license and free to use. We also provide additional evaluation data derivedfrom Tatoeba under CC licenses.