Abstract
Previous studies show that introducing new knowledge during large languagemodels (LLMs) fine-tuning can lead to the generation of erroneous output whentested on known information, thereby triggering factual hallucinations.However, existing studies have not deeply investigated the specificmanifestations and underlying mechanisms of these hallucinations. Our workaddresses this gap by designing a controlled dataset Biography-Reasoning, andconducting a fine-grained analysis across multiple knowledge types and two tasktypes, including knowledge question answering (QA) and knowledge reasoningtasks. We find that when fine-tuned on a dataset in which a specific knowledgetype consists entirely of new knowledge, LLMs exhibit significantly increasedhallucination tendencies. This suggests that the high unfamiliarity of aparticular knowledge type, rather than the overall proportion of new knowledge,is a stronger driver of hallucinations, and these tendencies can even affectother knowledge types in QA tasks. To mitigate such factual hallucinations, wepropose KnownPatch, which patches a small number of known knowledge samples inthe later stages of training, effectively alleviating new-knowledge-inducedhallucinations. Through attention analysis, we find that learning new knowledgereduces the model's attention to key entities in the question, thus causingexcessive focus on the surrounding context, which may increase the risk ofhallucination. Moreover, the attention pattern can propagate to similarcontexts, facilitating the spread of hallucinations to textually similarquestions. Our method effectively mitigates the disruption of new knowledgelearning to the model's attention on key entities, accompanied by improvedperformance.