Abstract
This paper reviews recent studies in understanding neural-networkrepresentations and learning neural networks with interpretable/disentangledmiddle-layer representations. Although deep neural networks have exhibitedsuperior performance in various tasks, the interpretability is always theAchilles' heel of deep neural networks. At present, deep neural networks obtainhigh discrimination power at the cost of low interpretability of theirblack-box representations. We believe that high model interpretability may helppeople to break several bottlenecks of deep learning, e.g., learning from veryfew annotations, learning via human-computer communications at the semanticlevel, and semantically debugging network representations. We focus onconvolutional neural networks (CNNs), and we revisit the visualization of CNNrepresentations, methods of diagnosing representations of pre-trained CNNs,approaches for disentangling pre-trained CNN representations, learning of CNNswith disentangled representations, and middle-to-end learning based on modelinterpretability. Finally, we discuss prospective trends in explainableartificial intelligence.