Abstract
Designing therapeutic peptides with tailored properties is hindered by thevastness of sequence space, limited experimental data, and poorinterpretability of current generative models. To address these challenges, weintroduce PepThink-R1, a generative framework that integrates large languagemodels (LLMs) with chain-of-thought (CoT) supervised fine-tuning andreinforcement learning (RL). Unlike prior approaches, PepThink-R1 explicitlyreasons about monomer-level modifications during sequence generation, enablinginterpretable design choices while optimizing for multiple pharmacologicalproperties. Guided by a tailored reward function balancing chemical validityand property improvements, the model autonomously explores diverse sequencevariants. We demonstrate that PepThink-R1 generates cyclic peptides withsignificantly enhanced lipophilicity, stability, and exposure, outperformingexisting general LLMs (e.g., GPT-5) and domain-specific baseline in bothoptimization success and interpretability. To our knowledge, this is the firstLLM-based peptide design framework that combines explicit reasoning withRL-driven property control, marking a step toward reliable and transparentpeptide optimization for therapeutic discovery.