Abstract
When an LLM learns a new fact during finetuning (e.g., new movie releases,newly elected pope, etc.), where does this information go? Are entitiesenriched with relation information, or do models recall informationjust-in-time before a prediction? Or, are ``all of the above'' true with LLMsimplementing multiple redundant heuristics? Existing localization approaches(e.g., activation patching) are ill-suited for this analysis because theyusually \textit{replace} parts of the residual stream, thus overriding previousinformation. To fill this gap, we propose \emph{dynamic weight grafting}, atechnique that selectively grafts weights from a finetuned model onto apretrained model. Using this technique, we show two separate pathways forretrieving finetuned relation information: 1) ``enriching" the residual streamwith relation information while processing the tokens that correspond to anentity (e.g., ``Zendaya'' in ``Zendaya co-starred with John David Washington'')and 2) ``recalling" this information at the final token position beforegenerating a target fact. In some cases, models need information from both ofthese pathways to correctly generate finetuned facts while, in other cases,either the ``enrichment" or ``recall" pathway alone is sufficient. We localizethe ``recall'' pathway to model components -- finding that ``recall" occurs viaboth task-specific attention mechanisms and an entity-specific extraction stepin the feedforward networks of the final layers before the target prediction.By targeting model components and parameters, as opposed to just activations,we are able to understand the \textit{mechanisms} by which finetuned knowledgeis retrieved during generation.