When less is more: Simplifying inputs aids neural network understanding

Abstract

How do neural network image classifiers respond to simpler and simplerinputs? And what do such responses reveal about the learning process? To answerthese questions, we need a clear measure of input simplicity (or inversely,complexity), an optimization objective that correlates with simplification, anda framework to incorporate such objective into training and inference. Lastlywe need a variety of testbeds to experiment and evaluate the impact of suchsimplification on learning. In this work, we measure simplicity with theencoding bit size given by a pretrained generative model, and minimize the bitsize to simplify inputs in training and inference. We investigate the effect ofsuch simplification in several scenarios: conventional training, datasetcondensation and post-hoc explanations. In all settings, inputs are simplifiedalong with the original classification task, and we investigate the trade-offbetween input simplicity and task performance. For images with injecteddistractors, such simplification naturally removes superfluous information. Fordataset condensation, we find that inputs can be simplified with almost noaccuracy degradation. When used in post-hoc explanation, our learning-basedsimplification approach offers a valuable new tool to explore the basis ofnetwork decisions.

Quick Read (beta)

loading the full paper ...