Abstract
Progress in image generation raises significant public security concerns. Weargue that fake image detection should not operate as a "black box". Instead,an ideal approach must ensure both strong generalization and transparency.Recent progress in Multi-modal Large Language Models (MLLMs) offers newopportunities for reasoning-based AI-generated image detection. In this work,we evaluate the capabilities of MLLMs in comparison to traditional detectionmethods and human evaluators, highlighting their strengths and limitations.Furthermore, we design six distinct prompts and propose a framework thatintegrates these prompts to develop a more robust, explainable, andreasoning-driven detection system. The code is available athttps://github.com/Gennadiyev/mllm-defake.