Abstract
Despite impressive performance on numerous visual tasks, Convolutional NeuralNetworks (CNNs) --- unlike brains --- are often highly sensitive to smallperturbations of their input, e.g. adversarial noise leading to erroneousdecisions. We propose to regularize CNNs using large-scale neuroscience data tolearn more robust neural features in terms of representational similarity. Wepresented natural images to mice and measured the responses of thousands ofneurons from cortical visual areas. Next, we denoised the notoriously variableneural activity using strong predictive models trained on this large corpus ofresponses from the mouse visual system, and calculated the representationalsimilarity for millions of pairs of images from the model's predictions. Wethen used the neural representation similarity to regularize CNNs trained onimage classification by penalizing intermediate representations that deviatedfrom neural ones. This preserved performance of baseline models whenclassifying images under standard benchmarks, while maintaining substantiallyhigher performance compared to baseline or control models when classifyingnoisy images. Moreover, the models regularized with cortical representationsalso improved model robustness in terms of adversarial attacks. Thisdemonstrates that regularizing with neural data can be an effective tool tocreate an inductive bias towards more robust inference.