Large Language Models Leverage External Knowledge to Extend Clinical Insight Beyond Language Boundaries

Abstract

$\textbf{Objectives}$: Large Language Models (LLMs) such as ChatGPT andMed-PaLM have excelled in various medical question-answering tasks. However,these English-centric models encounter challenges in non-English clinicalsettings, primarily due to limited clinical knowledge in respective languages,a consequence of imbalanced training corpora. We systematically evaluate LLMsin the Chinese medical context and develop a novel in-context learningframework to enhance their performance. $\textbf{Materials and Methods}$: The latest China National Medical LicensingExamination (CNMLE-2022) served as the benchmark. We collected 53 medical booksand 381,149 medical questions to construct the medical knowledge base andquestion bank. The proposed Knowledge and Few-shot Enhancement In-contextLearning (KFE) framework leverages the in-context learning ability of LLMs tointegrate diverse external clinical knowledge sources. We evaluated KFE withChatGPT(GPT3.5), GPT4, Baichuan2(BC2)-7B, and BC2-13B in CNMLE-2022 andinvestigated the effectiveness of different pathways for incorporating LLMswith medical knowledge from 7 perspectives. $\textbf{Results}$: Directly applying ChatGPT failed to qualify for theCNMLE-2022 at a score of 51. Cooperated with the KFE, the LLMs with varyingsizes yielded consistent and significant improvements. The ChatGPT'sperformance surged to 70.04 and GPT-4 achieved the highest score of 82.59. Thissurpasses the qualification threshold (60) and exceeds the average human scoreof 68.70. It also enabled a smaller BC2-13B to pass the examination, showcasingthe great potential in low-resource settings. $\textbf{Conclusion}$: By synergizing medical knowledge through in-contextlearning, LLM can extend clinical insight beyond language barriers,significantly reducing language-related disparities of LLM applications andensuring global benefit in healthcare.

Quick Read (beta)

loading the full paper ...