Abstract
The proliferation of inflammatory or misleading "fake" news content hasbecome increasingly common in recent years. Simultaneously, it has becomeeasier than ever to use AI tools to generate photorealistic images depictingany scene imaginable. Combining these two -- AI-generated fake news content --is particularly potent and dangerous. To combat the spread of AI-generated fakenews, we propose the MiRAGeNews Dataset, a dataset of 12,500 high-quality realand AI-generated image-caption pairs from state-of-the-art generators. We findthat our dataset poses a significant challenge to humans (60% F-1) andstate-of-the-art multi-modal LLMs (< 24% F-1). Using our dataset we train amulti-modal detector (MiRAGe) that improves by +5.1% F-1 over state-of-the-artbaselines on image-caption pairs from out-of-domain image generators and newspublishers. We release our code and data to aid future work on detectingAI-generated content.