On the Relationship between Skill Neurons and Robustness in Prompt Tuning

Abstract

Prompt Tuning is a popular parameter-efficient finetuning method forpre-trained large language models (PLMs). Recently, based on experiments withRoBERTa, it has been suggested that Prompt Tuning activates specific neurons inthe transformer's feed-forward networks, that are highly predictive andselective for the given task. In this paper, we study the robustness of PromptTuning in relation to these "skill neurons", using RoBERTa and T5. We show thatprompts tuned for a specific task are transferable to tasks of the same typebut are not very robust to adversarial data, with higher robustness for T5 thanRoBERTa. At the same time, we replicate the existence of skill neurons inRoBERTa and further show that skill neurons also seem to exist in T5.Interestingly, the skill neurons of T5 determined on non-adversarial data arealso among the most predictive neurons on the adversarial data, which is notthe case for RoBERTa. We conclude that higher adversarial robustness may berelated to a model's ability to activate the relevant skill neurons onadversarial data.

Quick Read (beta)

loading the full paper ...