Neural Stethoscopes: Unifying Analytic, Auxiliary and Adversarial Network Probing

Abstract

Model interpretability and systematic, targeted model adaptation presentcentral tenets in machine learning for addressing limited or biased datasets.In this paper, we introduce neural stethoscopes as a framework for quantifyingthe degree of importance of specific factors of influence in deep networks aswell as for actively promoting and suppressing information as appropriate. Indoing so we unify concepts from multitask learning as well as training withauxiliary and adversarial losses. We showcase the efficacy of neuralstethoscopes in an intuitive physics domain. Specifically, we investigate thechallenge of visually predicting stability of block towers and demonstrate thatthe network uses visual cues which makes it susceptible to biases in thedataset. Through the use of stethoscopes we interrogate the accessibility ofspecific information throughout the network stack and show that we are able toactively de-bias network predictions as well as enhance performance viasuitable auxiliary and adversarial stethoscope losses.

Quick Read (beta)

loading the full paper ...