Shaping Visual Representations with Language for Few-shot Classification

Abstract

Language is designed to convey useful information about the world, thusserving as a scaffold for efficient human learning. How can we let languageguide representation learning in machine learning models? We explore thisquestion in the setting of few-shot visual classification, proposing modelswhich learn to perform visual classification while jointly predicting naturallanguage task descriptions at train time. At test time, with no languageavailable, we find that these language-influenced visual representations aremore generalizable, compared to meta-learning baselines and approaches thatexplicitly use language as a bottleneck for classification.

Quick Read (beta)

loading the full paper ...