Counterfactual VQA: A Cause-Effect Look at Language Bias

Abstract

VQA models may tend to rely on language bias as a shortcut and thus fail tosufficiently learn the multi-modal knowledge from both vision and language.Recent debiasing methods proposed to exclude the language prior duringinference. However, they fail to disentangle the "good" language context and"bad" language bias from the whole. In this paper, we investigate how tomitigate language bias in VQA. Motivated by causal effects, we proposed a novelcounterfactual inference framework, which enables us to capture the languagebias as the direct causal effect of questions on answers and reduce thelanguage bias by subtracting the direct language effect from the total causaleffect. Experiments demonstrate that our proposed counterfactual inferenceframework 1) is general to various VQA backbones and fusion strategies, 2)achieves competitive performance on the language-bias sensitive VQA-CP datasetwhile performs robustly on the balanced VQA v2 dataset without any augmenteddata. The code is available at https://github.com/yuleiniu/cfvqa.

Quick Read (beta)

loading the full paper ...