Abstract
Advances in AI generative models facilitate super-realistic video synthesis,amplifying misinformation risks via social media and eroding trust in digitalcontent. Several research works have explored new deepfake detection methods onAI-generated images to alleviate these risks. However, with the fastdevelopment of video generation models, such as Sora and WanX, there iscurrently a lack of large-scale, high-quality AI-generated video datasets forforgery detection. In addition, existing detection approaches predominantlytreat the task as binary classification, lacking explainability in modeldecision-making and failing to provide actionable insights or guidance for thepublic. To address these challenges, we propose \textbf{GenBuster-200K}, alarge-scale AI-generated video dataset featuring 200K high-resolution videoclips, diverse latest generative techniques, and real-world scenes. We furtherintroduce \textbf{BusterX}, a novel AI-generated video detection andexplanation framework leveraging multimodal large language model (MLLM) andreinforcement learning for authenticity determination and explainablerationale. To our knowledge, GenBuster-200K is the {\it \textbf{first}}large-scale, high-quality AI-generated video dataset that incorporates thelatest generative techniques for real-world scenarios. BusterX is the {\it\textbf{first}} framework to integrate MLLM with reinforcement learning forexplainable AI-generated video detection. Extensive comparisons withstate-of-the-art methods and ablation studies validate the effectiveness andgeneralizability of BusterX. The code, models, and datasets will be released.