Abstract
Self-supervised learning (SSL) has proven vital for advancing research innatural language processing (NLP) and computer vision (CV). The paradigmpretrains a shared model on large volumes of unlabeled data and achievesstate-of-the-art (SOTA) for various tasks with minimal adaptation. However, thespeech processing community lacks a similar setup to systematically explore theparadigm. To bridge this gap, we introduce Speech processing UniversalPERformance Benchmark (SUPERB). SUPERB is a leaderboard to benchmark theperformance of a shared model across a wide range of speech processing taskswith minimal architecture changes and labeled data. Among multiple usages ofthe shared model, we especially focus on extracting the representation learnedfrom SSL due to its preferable re-usability. We present a simple framework tosolve SUPERB tasks by learning task-specialized lightweight prediction heads ontop of the frozen shared model. Our results demonstrate that the framework ispromising as SSL representations show competitive generalizability andaccessibility across SUPERB tasks. We release SUPERB as a challenge with aleaderboard and a benchmark toolkit to fuel the research in representationlearning and general speech processing.