The Heterogeneity Hypothesis: Finding Layer-Wise Dissimilated Network Architecture

Abstract

In this paper, we tackle the problem of convolutional neural network design.Instead of focusing on the overall architecture design, we investigate a designspace that is usually overlooked, \ie adjusting the channel configurations ofpredefined networks. We find that this adjustment can be achieved by pruningwidened baseline networks and leads to superior performance. Base on that, wearticulate the ``heterogeneity hypothesis'': with the same training protocol,there exists a layer-wise dissimilated network architecture (LW-DNA) that canoutperform the original network with regular channel configurations under lowerlevel of model complexity. The LW-DNA models are identified without added computational cost andtraining time compared with the original network. This constraint leads tocontrolled experiment which directs the focus to the importance of layer-wisespecific channel configurations. Multiple sources of hints relate the benefitsof LW-DNA models to overfitting, \ie the relative relationship between modelcomplexity and dataset size. Experiments are conducted on various networks anddatasets for image classification, visual tracking and image restoration. Theresultant LW-DNA models consistently outperform the compared baseline models.

Quick Read (beta)

loading the full paper ...