HyperTransformer: Model Generation for Supervised and Semi-Supervised Few-Shot Learning

Abstract

In this work we propose a HyperTransformer, a transformer-based model forfew-shot learning that generates weights of a convolutional neural network(CNN) directly from support samples. Since the dependence of a small generatedCNN model on a specific task is encoded by a high-capacity transformer model,we effectively decouple the complexity of the large task space from thecomplexity of individual tasks. Our method is particularly effective for smalltarget CNN architectures where learning a fixed universal task-independentembedding is not optimal and better performance is attained when theinformation about the task can modulate all model parameters. For larger modelswe discover that generating the last layer alone allows us to producecompetitive or better results than those obtained with state-of-the-art methodswhile being end-to-end differentiable. Finally, we extend our approach to asemi-supervised regime utilizing unlabeled samples in the support set andfurther improving few-shot performance.

Quick Read (beta)

loading the full paper ...