Model Editing at Scale leads to Gradual and Catastrophic Forgetting

Abstract

Editing knowledge in large language models is an attractive capability tohave which allows us to correct incorrectly learnt facts during pre-training,as well as update the model with an ever-growing list of new facts. Whileexisting model editing techniques have shown promise, they are usuallyevaluated using metrics for reliability, specificity and generalization overone or few edits. We argue that for model editing to have practical utility, wemust be able to make multiple edits to the same model. With this in mind, weevaluate the current model editing methods at scale, focusing on two state ofthe art methods: ROME and MEMIT. We find that as the model is editedsequentially with multiple facts, it continually forgets previously editedfacts and the ability to perform downstream tasks. This forgetting happens intwo phases -- an initial gradual but progressive forgetting phase followed byabrupt or catastrophic forgetting phase. Both gradual and catastrophicforgetting limit the usefulness of model editing methods at scale -- the formermaking model editing less effective as multiple edits are made to the modelwhile the latter caps the scalability of such model editing methods. Ouranalysis also highlights other key limitations of ROME and MEMIT at scale. Withour work, we push for the development and evaluation of model editing methodskeeping scalability in mind.

Quick Read (beta)

loading the full paper ...