Abstract
For language documentation initiatives, transcription is an expensiveresource: one minute of audio is estimated to take one hour and a half onaverage of a linguist's work (Austin and Sallabank, 2013). Recently, collectingaligned translations in well-resourced languages became a popular solution forensuring posterior interpretability of the recordings (Adda et al. 2016). Inthis paper we investigate language-related impact in automatic approaches forcomputational language documentation. We translate the bilingual Mboshi-Frenchparallel corpus (Godard et al. 2017) into four other languages, and we performbilingual-rooted unsupervised word discovery. Our results hint towards animpact of the well-resourced language in the quality of the output. However, bycombining the information learned by different bilingual models, we are onlyable to marginally increase the quality of the segmentation.