Deep k-Nearest Neighbors: Towards Confident, Interpretable and Robust Deep Learning

Abstract

Deep neural networks (DNNs) enable innovative applications of machinelearning like image recognition, machine translation, or malware detection.However, deep learning is often criticized for its lack of robustness inadversarial settings (e.g., vulnerability to adversarial inputs) and generalinability to rationalize its predictions. In this work, we exploit thestructure of deep learning to enable new learning-based inference and decisionstrategies that achieve desirable properties such as robustness andinterpretability. We take a first step in this direction and introduce the Deepk-Nearest Neighbors (DkNN). This hybrid classifier combines the k-nearestneighbors algorithm with representations of the data learned by each layer ofthe DNN: a test input is compared to its neighboring training points accordingto the distance that separates them in the representations. We show the labelsof these neighboring points afford confidence estimates for inputs outside themodel's training manifold, including on malicious inputs like adversarialexamples--and therein provides protections against inputs that are outside themodels understanding. This is because the nearest neighbors can be used toestimate the nonconformity of, i.e., the lack of support for, a prediction inthe training data. The neighbors also constitute human-interpretableexplanations of predictions. We evaluate the DkNN algorithm on severaldatasets, and show the confidence estimates accurately identify inputs outsidethe model, and that the explanations provided by nearest neighbors areintuitive and useful in understanding model failures.

Quick Read (beta)

loading the full paper ...