Creation-MMBench: Assessing Context-Aware Creative Intelligence in MLLM

Abstract

Creativity is a fundamental aspect of intelligence, involving the ability togenerate novel and appropriate solutions across diverse contexts. While LargeLanguage Models (LLMs) have been extensively evaluated for their creativecapabilities, the assessment of Multimodal Large Language Models (MLLMs) inthis domain remains largely unexplored. To address this gap, we introduceCreation-MMBench, a multimodal benchmark specifically designed to evaluate thecreative capabilities of MLLMs in real-world, image-based tasks. The benchmarkcomprises 765 test cases spanning 51 fine-grained tasks. To ensure rigorousevaluation, we define instance-specific evaluation criteria for each test case,guiding the assessment of both general response quality and factual consistencywith visual inputs. Experimental results reveal that current open-source MLLMssignificantly underperform compared to proprietary models in creative tasks.Furthermore, our analysis demonstrates that visual fine-tuning can negativelyimpact the base LLM's creative abilities. Creation-MMBench provides valuableinsights for advancing MLLM creativity and establishes a foundation for futureimprovements in multimodal generative intelligence. Full data and evaluationcode is released on https://github.com/open-compass/Creation-MMBench.

Quick Read (beta)

loading the full paper ...