Overcoming Statistical Shortcuts for Open-ended Visual Counting

  • 2020-06-17 18:02:01
  • Corentin Dancette, Remi Cadene, Xinlei Chen, Matthieu Cord
  • 27

Abstract

Machine learning models tend to over-rely on statistical shortcuts. Thesespurious correlations between parts of the input and the output labels does nothold in real-world settings. We target this issue on the recent open-endedvisual counting task which is well suited to study statistical shortcuts. Weaim to develop models that learn a proper mechanism of counting regardless ofthe output label. First, we propose the Modifying Count Distribution (MCD)protocol, which penalizes models that over-rely on statistical shortcuts. It isbased on pairs of training and testing sets that do not follow the same countlabel distribution such as the odd-even sets. Intuitively, models that havelearned a proper mechanism of counting on odd numbers should perform well oneven numbers. Secondly, we introduce the Spatial Counting Network (SCN), whichis dedicated to visual analysis and counting based on natural languagequestions. Our model selects relevant image regions, scores them with fusionand self-attention mechanisms, and provides a final counting score. We applyour protocol on the recent dataset, TallyQA, and show superior performancescompared to state-of-the-art models. We also demonstrate the ability of ourmodel to select the correct instances to count in the image. Code and datasetsare available: https://github.com/cdancette/spatial-counting-network

 

Quick Read (beta)

loading the full paper ...