Natural Language Descriptions of Deep Visual Features

Abstract

Some neurons in deep networks specialize in recognizing highly specificperceptual, structural, or semantic features of inputs. In computer vision,techniques exist for identifying neurons that respond to individual conceptcategories like colors, textures, and object classes. But these techniques arelimited in scope, labeling only a small subset of neurons and behaviors in anynetwork. Is a richer characterization of neuron-level computation possible? Weintroduce a procedure (called MILAN, for mutual-information-guided linguisticannotation of neurons) that automatically labels neurons with open-ended,compositional, natural language descriptions. Given a neuron, MILAN generates adescription by searching for a natural language string that maximizes pointwisemutual information with the image regions in which the neuron is active. MILANproduces fine-grained descriptions that capture categorical, relational, andlogical structure in learned features. These descriptions obtain high agreementwith human-generated feature descriptions across a diverse set of modelarchitectures and tasks, and can aid in understanding and controlling learnedmodels. We highlight three applications of natural language neurondescriptions. First, we use MILAN for analysis, characterizing the distributionand importance of neurons selective for attribute, category, and relationalinformation in vision models. Second, we use MILAN for auditing, surfacingneurons sensitive to protected categories like race and gender in modelstrained on datasets intended to obscure these features. Finally, we use MILANfor editing, improving robustness in an image classifier by deleting neuronssensitive to text features spuriously correlated with class labels.

Quick Read (beta)

loading the full paper ...