Abstract
Knowledge editing methods like MEMIT are able to make data and computeefficient updates of factual knowledge by using a single sentence to updatefacts and their consequences. However, what is often overlooked is a"precomputation step", which requires a one-time but significant computationalcost. The authors of MEMIT originally precompute approximately 44 millionhidden vectors per edited layer, which requires a forward pass over 44 milliontokens. For GPT-J (6B), this precomputation step takes 36 hours on a singleGPU, while it takes approximately 40 hours for Llama2-7B. Additionally, thisprecomputation time grows with model size. In this paper, we show that thisexcessive computational cost is unnecessary. Knowledge editing using MEMIT andrelated methods, such as ROME and EMMET, can be performed by pre-computing avery small portion of the 44 million hidden vectors. We first present thetheoretical minimum number of hidden vector precomputation required forsolutions of these editing methods to exist. We then empirically show thatknowledge editing using these methods can be done by pre-computingsignificantly fewer hidden vectors. Specifically, we show that theprecomputation step can be done with less than 0.3% of the originallystipulated number of hidden vectors. This saves a significant amount ofprecomputation time and allows users to begin editing new models within a fewminutes.