Existing deep convolutional neural networks (CNNs) generate massiveinterlayer feature data during network inference. To maintain real-timeprocessing in embedded systems, large on-chip memory is required to buffer theinterlayer feature maps. In this paper, we propose an efficient hardwareaccelerator with an interlayer feature compression technique to significantlyreduce the required on-chip memory size and off-chip memory access bandwidth.The accelerator compresses interlayer feature maps through transforming thestored data into frequency domain using hardware-implemented 8x8 discretecosine transform (DCT). The high-frequency components are removed after the DCTthrough quantization. Sparse matrix compression is utilized to further compressthe interlayer feature maps. The on-chip memory allocation scheme is designedto support dynamic configuration of the feature map buffer size and scratch padsize according to different network-layer requirements. The hardwareaccelerator combines compression, decompression, and CNN acceleration into onecomputing stream, achieving minimal compressing and processing delay. Aprototype accelerator is implemented on an FPGA platform and also synthesizedin TSMC 28-nm COMS technology. It achieves 403GOPS peak throughput and1.4x~3.3x interlayer feature map reduction by adding light hardware areaoverhead, making it a promising hardware accelerator for intelligent IoTdevices.