Performance of GPT-5 in Brain Tumor MRI Reasoning

Abstract

Accurate differentiation of brain tumor types on magnetic resonance imaging(MRI) is critical for guiding treatment planning in neuro-oncology. Recentadvances in large language models (LLMs) have enabled visual question answering(VQA) approaches that integrate image interpretation with natural languagereasoning. In this study, we evaluated GPT-4o, GPT-5-nano, GPT-5-mini, andGPT-5 on a curated brain tumor VQA benchmark derived from 3 Brain TumorSegmentation (BraTS) datasets - glioblastoma (GLI), meningioma (MEN), and brainmetastases (MET). Each case included multi-sequence MRI triplanar mosaics andstructured clinical features transformed into standardized VQA items. Modelswere assessed in a zero-shot chain-of-thought setting for accuracy on bothvisual and reasoning tasks. Results showed that GPT-5-mini achieved the highestmacro-average accuracy (44.19%), followed by GPT-5 (43.71%), GPT-4o (41.49%),and GPT-5-nano (35.85%). Performance varied by tumor subtype, with no singlemodel dominating across all cohorts. These findings suggest that GPT-5 familymodels can achieve moderate accuracy in structured neuro-oncological VQA tasks,but not at a level acceptable for clinical use.

Quick Read (beta)

loading the full paper ...