PepThink-R1: LLM for Interpretable Cyclic Peptide Optimization with CoT SFT and Reinforcement Learning

  • 2025-08-20 15:13:52
  • Ruheng Wang, Hang Zhang, Trieu Nguyen, Shasha Feng, Hao-Wei Pang, Xiang Yu, Li Xiao, Peter Zhiping Zhang
  • 0

Abstract

Designing therapeutic peptides with tailored properties is hindered by thevastness of sequence space, limited experimental data, and poorinterpretability of current generative models. To address these challenges, weintroduce PepThink-R1, a generative framework that integrates large languagemodels (LLMs) with chain-of-thought (CoT) supervised fine-tuning andreinforcement learning (RL). Unlike prior approaches, PepThink-R1 explicitlyreasons about monomer-level modifications during sequence generation, enablinginterpretable design choices while optimizing for multiple pharmacologicalproperties. Guided by a tailored reward function balancing chemical validityand property improvements, the model autonomously explores diverse sequencevariants. We demonstrate that PepThink-R1 generates cyclic peptides withsignificantly enhanced lipophilicity, stability, and exposure, outperformingexisting general LLMs (e.g., GPT-5) and domain-specific baseline in bothoptimization success and interpretability. To our knowledge, this is the firstLLM-based peptide design framework that combines explicit reasoning withRL-driven property control, marking a step toward reliable and transparentpeptide optimization for therapeutic discovery.

 

Quick Read (beta)

loading the full paper ...