VenusFactory: A Unified Platform for Protein Engineering Data Retrieval and Language Model Fine-Tuning

  • 2025-03-19 18:19:07
  • Yang Tan, Chen Liu, Jingyuan Gao, Banghao Wu, Mingchen Li, Ruilin Wang, Lingrong Zhang, Huiqun Yu, Guisheng Fan, Liang Hong, Bingxin Zhou
  • 0

Abstract

Natural language processing (NLP) has significantly influenced scientificdomains beyond human language, including protein engineering, where pre-trainedprotein language models (PLMs) have demonstrated remarkable success. However,interdisciplinary adoption remains limited due to challenges in datacollection, task benchmarking, and application. This work presentsVenusFactory, a versatile engine that integrates biological data retrieval,standardized task benchmarking, and modular fine-tuning of PLMs. VenusFactorysupports both computer science and biology communities with choices of both acommand-line execution and a Gradio-based no-code interface, integrating $40+$protein-related datasets and $40+$ popular PLMs. All implementations areopen-sourced on https://github.com/tyang816/VenusFactory.

 

Quick Read (beta)

loading the full paper ...