OneIG-Bench: Omni-dimensional Nuanced Evaluation for Image Generation

Abstract

Text-to-image (T2I) models have garnered significant attention for generatinghigh-quality images aligned with text prompts. However, rapid T2I modeladvancements reveal limitations in early benchmarks, lacking comprehensiveevaluations, for example, the evaluation on reasoning, text rendering andstyle. Notably, recent state-of-the-art models, with their rich knowledgemodeling capabilities, show promising results on the image generation problemsrequiring strong reasoning ability, yet existing evaluation systems have notadequately addressed this frontier. To systematically address these gaps, weintroduce OneIG-Bench, a meticulously designed comprehensive benchmarkframework for fine-grained evaluation of T2I models across multiple dimensions,including prompt-image alignment, text rendering precision, reasoning-generatedcontent, stylization, and diversity. By structuring the evaluation, thisbenchmark enables in-depth analysis of model performance, helping researchersand practitioners pinpoint strengths and bottlenecks in the full pipeline ofimage generation. Specifically, OneIG-Bench enables flexible evaluation byallowing users to focus on a particular evaluation subset. Instead ofgenerating images for the entire set of prompts, users can generate images onlyfor the prompts associated with the selected dimension and complete thecorresponding evaluation accordingly. Our codebase and dataset are now publiclyavailable to facilitate reproducible evaluation studies and cross-modelcomparisons within the T2I research community.

Quick Read (beta)

loading the full paper ...