Abstract
The fine-grained classification of brain tumor subtypes fromhistopathological whole slide images is highly challenging due to subtlemorphological variations and the scarcity of annotated data. Althoughvision-language models have enabled promising zero-shot classification, theirability to capture fine-grained pathological features remains limited,resulting in suboptimal subtype discrimination. To address these challenges, wepropose the Fine-Grained Patch Alignment Network (FG-PAN), a novel zero-shotframework tailored for digital pathology. FG-PAN consists of two key modules:(1) a local feature refinement module that enhances patch-level visual featuresby modeling spatial relationships among representative patches, and (2) afine-grained text description generation module that leverages large languagemodels to produce pathology-aware, class-specific semantic prototypes. Byaligning refined visual features with LLM-generated fine-grained descriptions,FG-PAN effectively increases class separability in both visual and semanticspaces. Extensive experiments on multiple public pathology datasets, includingEBRAINS and TCGA, demonstrate that FG-PAN achieves state-of-the-art performanceand robust generalization in zero-shot brain tumor subtype classification.