Prompt Tuning for Generative Multimodal Pretrained Models

Abstract

Prompt tuning has become a new paradigm for model tuning and it hasdemonstrated success in natural language pretraining and even visionpretraining. In this work, we explore the transfer of prompt tuning tomultimodal pretraining, with a focus on generative multimodal pretrainedmodels, instead of contrastive ones. Specifically, we implement prompt tuningon the unified sequence-to-sequence pretrained model adaptive to bothunderstanding and generation tasks. Experimental results demonstrate that thelight-weight prompt tuning can achieve comparable performance with finetuningand surpass other light-weight tuning methods. Besides, in comparison withfinetuned models, the prompt-tuned models demonstrate improved robustnessagainst adversarial attacks. We further figure out that experimental factors,including the prompt length, prompt depth, and reparameteratization, have greatimpacts on the model performance, and thus we empirically provide arecommendation for the setups of prompt tuning. Despite the observedadvantages, we still find some limitations in prompt tuning, and wecorrespondingly point out the directions for future studies. Codes areavailable at \url{https://github.com/OFA-Sys/OFA}

Quick Read (beta)

loading the full paper ...