GenoMAS: A Multi-Agent Framework for Scientific Discovery via Code-Driven Gene Expression Analysis

Abstract

Gene expression analysis holds the key to many biomedical discoveries, yetextracting insights from raw transcriptomic data remains formidable due to thecomplexity of multiple large, semi-structured files and the need for extensivedomain expertise. Current automation approaches are often limited by eitherinflexible workflows that break down in edge cases or by fully autonomousagents that lack the necessary precision for rigorous scientific inquiry.GenoMAS charts a different course by presenting a team of LLM-based scientiststhat integrates the reliability of structured workflows with the adaptabilityof autonomous agents. GenoMAS orchestrates six specialized LLM agents throughtyped message-passing protocols, each contributing complementary strengths to ashared analytic canvas. At the heart of GenoMAS lies a guided-planningframework: programming agents unfold high-level task guidelines into ActionUnits and, at each juncture, elect to advance, revise, bypass, or backtrack,thereby maintaining logical coherence while bending gracefully to theidiosyncrasies of genomic data. On the GenoTEX benchmark, GenoMAS reaches a Composite Similarity Correlationof 89.13% for data preprocessing and an F$_1$ of 60.48% for geneidentification, surpassing the best prior art by 10.61% and 16.85%respectively. Beyond metrics, GenoMAS surfaces biologically plausiblegene-phenotype associations corroborated by the literature, all while adjustingfor latent confounders. Code is available at https://github.com/Liu-Hy/GenoMAS.

Quick Read (beta)

loading the full paper ...