TOMG-Bench: Evaluating LLMs on Text-based Open Molecule Generation

Abstract

In this paper, we propose Text-based Open Molecule Generation Benchmark(TOMG-Bench), the first benchmark to evaluate the open-domain moleculegeneration capability of LLMs. TOMG-Bench encompasses a dataset of three majortasks: molecule editing (MolEdit), molecule optimization (MolOpt), andcustomized molecule generation (MolCustom). Each major task further containsthree subtasks, while each subtask comprises 5,000 test samples. Given theinherent complexity of open molecule generation evaluation, we also developedan automated evaluation system that helps measure both the quality and theaccuracy of the generated molecules. Our comprehensive benchmarking of 25 LLMsreveals the current limitations as well as potential areas for improvement intext-guided molecule discovery. Furthermore, we propose OpenMolIns, aspecialized instruction tuning dataset established for solving challengesraised by TOMG-Bench. Fine-tuned on OpenMolIns, Llama3.1-8B could outperformall the open-source general LLMs, even surpassing GPT-3.5-turbo by 46.5\% onTOMG-Bench. Our codes and datasets are available throughhttps://github.com/phenixace/TOMG-Bench.

Quick Read (beta)

loading the full paper ...