SAKE: Steering Activations for Knowledge Editing

Abstract

As Large Langue Models have been shown to memorize real-world facts, the needto update this knowledge in a controlled and efficient manner arises. Designedwith these constraints in mind, Knowledge Editing (KE) approaches propose toalter specific facts in pretrained models. However, they have been shown tosuffer from several limitations, including their lack of contextual robustnessand their failure to generalize to logical implications related to the fact. Toovercome these issues, we propose SAKE, a steering activation method thatmodels a fact to be edited as a distribution rather than a single prompt.Leveraging Optimal Transport, SAKE alters the LLM behavior over a wholefact-related distribution, defined as paraphrases and logical implications.Several numerical experiments demonstrate the effectiveness of this method:SAKE is thus able to perform more robust edits than its existing counterparts.

Quick Read (beta)

loading the full paper ...