Abstract
Social reasoning abilities are crucial for AI systems to effectivelyinterpret and respond to multimodal human communication and interaction withinsocial contexts. We introduce Social Genome, the first benchmark forfine-grained, grounded social reasoning abilities of multimodal models. SocialGenome contains 272 videos of interactions and 1,486 human-annotated reasoningtraces related to inferences about these interactions. These traces contain5,777 reasoning steps that reference evidence from visual cues, verbal cues,vocal cues, and external knowledge (contextual knowledge external to videos).Social Genome is also the first modeling challenge to study external knowledgein social reasoning. Social Genome computes metrics to holistically evaluatesemantic and structural qualities of model-generated social reasoning traces.We demonstrate the utility of Social Genome through experiments withstate-of-the-art models, identifying performance gaps and opportunities forfuture research to improve the grounded social reasoning abilities ofmultimodal models.