Abstract
In topic identification (topic ID) on real-world unstructured audio, an audioinstance of variable topic shifts is first broken into sequential segments, andeach segment is independently classified. We first present a general purposemethod for topic ID on spoken segments in low-resource languages, using acascade of universal acoustic modeling, translation lexicons to English, andEnglish-language topic classification. Next, instead of classifying eachsegment independently, we demonstrate that exploring the contextualdependencies across sequential segments can provide large improvements. Inparticular, we propose an attention-based contextual model which is able toleverage the contexts in a selective manner. We test both our contextual andnon-contextual models on four LORELEI languages, and on all but one ourattention-based contextual model significantly outperforms thecontext-independent models.