VenusFactory: A Unified Platform for Protein Engineering Data Retrieval and Language Model Fine-Tuning

Abstract

Natural language processing (NLP) has significantly influenced scientificdomains beyond human language, including protein engineering, where pre-trainedprotein language models (PLMs) have demonstrated remarkable success. However,interdisciplinary adoption remains limited due to challenges in datacollection, task benchmarking, and application. This work presentsVenusFactory, a versatile engine that integrates biological data retrieval,standardized task benchmarking, and modular fine-tuning of PLMs. VenusFactorysupports both computer science and biology communities with choices of both acommand-line execution and a Gradio-based no-code interface, integrating $40+$protein-related datasets and $40+$ popular PLMs. All implementations areopen-sourced on https://github.com/tyang816/VenusFactory.

Quick Read (beta)

loading the full paper ...