DABS: A Domain-Agnostic Benchmark for Self-Supervised Learning

  • 2021-11-23 18:22:14
  • Alex Tamkin, Vincent Liu, Rongfei Lu, Daniel Fein, Colin Schultz, Noah Goodman
  • 22

Abstract

Self-supervised learning algorithms, including BERT and SimCLR, have enabledsignificant strides in fields like natural language processing, computervision, and speech processing. However, these algorithms are domain-specific,meaning that new self-supervised learning algorithms must be developed for eachnew setting, including myriad healthcare, scientific, and multimodal domains.To catalyze progress toward domain-agnostic methods, we introduce DABS: aDomain-Agnostic Benchmark for Self-supervised learning. To perform well onDABS, an algorithm is evaluated on seven diverse domains: natural images,multichannel sensor data, English text, speech recordings, multilingual text,chest x-rays, and images with text descriptions. Each domain contains anunlabeled dataset for pretraining; the model is then is scored based on itsdownstream performance on a set of labeled tasks in the domain. We also presente-Mix and ShED: two baseline domain-agnostic algorithms; their relativelymodest performance demonstrates that significant progress is needed beforeself-supervised learning is an out-of-the-box solution for arbitrary domains.Code for benchmark datasets and baseline algorithms is available athttps://github.com/alextamkin/dabs.

 

Quick Read (beta)

loading the full paper ...