Abstract
The flourishing of video generation technologies has endangered thecredibility of real-world information and intensified the demand forAI-generated video detectors. Despite some progress, the lack of high-qualityreal-world datasets hinders the development of trustworthy detectors. In thispaper, we propose GenWorld, a large-scale, high-quality, and real-worldsimulation dataset for AI-generated video detection. GenWorld features thefollowing characteristics: (1) Real-world Simulation: GenWorld focuses onvideos that replicate real-world scenarios, which have a significant impact dueto their realism and potential influence; (2) High Quality: GenWorld employsmultiple state-of-the-art video generation models to provide realistic andhigh-quality forged videos; (3) Cross-prompt Diversity: GenWorld includesvideos generated from diverse generators and various prompt modalities (e.g.,text, image, video), offering the potential to learn more generalizableforensic features. We analyze existing methods and find they fail to detecthigh-quality videos generated by world models (i.e., Cosmos), revealingpotential drawbacks of ignoring real-world clues. To address this, we propose asimple yet effective model, SpannDetector, to leverage multi-view consistencyas a strong criterion for real-world AI-generated video detection. Experimentsshow that our method achieves superior results, highlighting a promisingdirection for explainable AI-generated video detection based on physicalplausibility. We believe that GenWorld will advance the field of AI-generatedvideo detection. Project Page: https://chen-wl20.github.io/GenWorld