Large Language Models are In-Context Molecule Learners

Abstract

Large Language Models (LLMs) have demonstrated exceptional performance inbiochemical tasks, especially the molecule caption translation task, which aimsto bridge the gap between molecules and natural language texts. However,previous methods in adapting LLMs to the molecule-caption translation taskrequired extra domain-specific pre-training stages, suffered weak alignmentbetween molecular and textual spaces, or imposed stringent demands on the scaleof LLMs. To resolve the challenges, we propose In-Context Molecule Adaptation(ICMA), as a new paradigm allowing LLMs to learn the molecule-text alignmentfrom context examples via In-Context Molecule Tuning. Specifically, ICMAincorporates the following three stages: Hybrid Context Retrieval,Post-retrieval Re-ranking, and In-context Molecule Tuning. Initially, HybridContext Retrieval utilizes BM25 Caption Retrieval and Molecule Graph Retrievalto retrieve similar informative context examples. Additionally, Post-retrievalRe-ranking is composed of Sequence Reversal and Random Walk selection tofurther improve the quality of retrieval results. Finally, In-Context MoleculeTuning unlocks the in-context learning and reasoning capability of LLMs withthe retrieved examples and adapts the parameters of LLMs for better alignmentbetween molecules and texts. Experimental results demonstrate that ICMA canempower LLMs to achieve state-of-the-art or comparable performance withoutextra training corpora and intricate structures, showing that LLMs areinherently in-context molecule learners.

Quick Read (beta)

loading the full paper ...