Should We Really Edit Language Models? On the Evaluation of Edited Language Models

Abstract

Model editing has become an increasingly popular alternative for efficientlyupdating knowledge within language models. Current methods mainly focus onreliability, generalization, and locality, with many methods excelling acrossthese criteria. Some recent works disclose the pitfalls of these editingmethods such as knowledge distortion or conflict. However, the generalabilities of post-edited language models remain unexplored. In this paper, weperform a comprehensive evaluation on various editing methods and differentlanguage models, and have following findings. (1) Existing editing methods leadto inevitable performance deterioration on general benchmarks, indicating thatexisting editing methods maintain the general abilities of the model withinonly a few dozen edits. When the number of edits is slightly large, theintrinsic knowledge structure of the model is disrupted or even completelydamaged. (2) Instruction-tuned models are more robust to editing, showing lessperformance drop on general knowledge after editing. (3) Language model withlarge scale is more resistant to editing compared to small model. (4) Thesafety of the edited model, is significantly weakened, even for thosesafety-aligned models. Our findings indicate that current editing methods areonly suitable for small-scale knowledge updates within language models, whichmotivates further research on more practical and reliable editing methods. Thedetails of code and reproduction can be found inhttps://github.com/lqinfdim/EditingEvaluation.

Quick Read (beta)

loading the full paper ...