Mixed Signals: Understanding Model Disagreement in Multimodal Empathy Detection

Abstract

Multimodal models play a key role in empathy detection, but their performancecan suffer when modalities provide conflicting cues. To understand thesefailures, we examine cases where unimodal and multimodal predictions diverge.Using fine-tuned models for text, audio, and video, along with a gated fusionmodel, we find that such disagreements often reflect underlying ambiguity, asevidenced by annotator uncertainty. Our analysis shows that dominant signals inone modality can mislead fusion when unsupported by others. We also observethat humans, like models, do not consistently benefit from multimodal input.These insights position disagreement as a useful diagnostic signal foridentifying challenging examples and improving empathy system robustness.

Quick Read (beta)

loading the full paper ...