Abstract
This paper reviews recent studies in emerging directions of understandingneural-network representations and learning neural networks withinterpretable/disentangled middle-layer representations. Although deep neuralnetworks have exhibited superior performance in various tasks, theinterpretability is always an Achilles' heel of deep neural networks. Atpresent, deep neural networks obtain a high discrimination power at the cost oflow interpretability of their black-box representations. We believe that thehigh model interpretability may help people to break several bottlenecks ofdeep learning, e.g., learning from very few annotations, learning viahuman-computer communications at the semantic level, and semantically debuggingnetwork representations. In this paper, we focus on convolutional neuralnetworks (CNNs), and we revisit the visualization of CNN representations,methods of diagnosing representations of pre-trained CNNs, approaches fordisentangling pre-trained CNN representations, learning of CNNs withdisentangled representations, and middle-to-end learning based on modelinterpretability. Finally, we discuss prospective trends of explainableartificial intelligence.