MoPD: Mixture-of-Prompts Distillation for Vision-Language Models

Abstract

Soft prompt learning methods are effective for adapting vision-languagemodels (VLMs) to downstream tasks. Nevertheless, empirical evidence reveals atendency of existing methods that they overfit seen classes and exhibitdegraded performance on unseen classes. This limitation is due to the inherentbias in the training data towards the seen classes. To address this issue, wepropose a novel soft prompt learning method, named Mixture-of-PromptsDistillation (MoPD), which can effectively transfer useful knowledge from hardprompts manually hand-crafted (a.k.a. teacher prompts) to the learnable softprompt (a.k.a. student prompt), thereby enhancing the generalization ability ofsoft prompts on unseen classes. Moreover, the proposed MoPD method utilizes agating network that learns to select hard prompts used for prompt distillation.Extensive experiments demonstrate that the proposed MoPD method outperformsstate-of-the-art baselines especially on on unseen classes.

Quick Read (beta)

loading the full paper ...