Abstract
Large language models (LLMs) hold significant potential for mental healthsupport, capable of generating empathetic responses and simulating therapeuticconversations. However, existing LLM-based approaches often lack the clinicalgrounding necessary for real-world psychological counseling, particularly inexplicit diagnostic reasoning aligned with standards like the DSM/ICD andincorporating diverse therapeutic modalities beyond basic empathy or singlestrategies. To address these critical limitations, we propose PsyLLM, the firstlarge language model designed to systematically integrate both diagnostic andtherapeutic reasoning for mental health counseling. To develop PsyLLM, wedesign a novel automated data synthesis pipeline that processes real-worldmental health posts collected from Reddit, where users frequently sharepsychological distress and seek community support. This pipeline processesreal-world mental health posts, generates multi-turn dialogue structures, andleverages LLMs guided by international diagnostic standards (e.g., DSM/ICD) andmultiple therapeutic frameworks (e.g., CBT, ACT, psychodynamic) to simulatedetailed clinical reasoning processes. Rigorous multi-dimensional filteringensures the generation of high-quality, clinically aligned dialogue data. Inaddition, we introduce a new benchmark and evaluation protocol, assessingcounseling quality across four key dimensions. Our experiments demonstrate thatPsyLLM significantly outperforms state-of-the-art baseline models on thisbenchmark. The model weights and dataset have been publicly released athttps://github.com/Emo-gml/PsyLLM.