Searching for A Robust Neural Architecture in Four GPU Hours

Abstract

Conventional neural architecture search (NAS) approaches are based onreinforcement learning or evolutionary strategy, which take more than 3000 GPUhours to find a good model on CIFAR-10. We propose an efficient NAS approachlearning to search by gradient descent. Our approach represents the searchspace as a directed acyclic graph (DAG). This DAG contains billions ofsub-graphs, each of which indicates a kind of neural architecture. To avoidtraversing all the possibilities of the sub-graphs, we develop a differentiablesampler over the DAG. This sampler is learnable and optimized by the validationloss after training the sampled architecture. In this way, our approach can betrained in an end-to-end fashion by gradient descent, named Gradient-basedsearch using Differentiable Architecture Sampler (GDAS). In experiments, we canfinish one searching procedure in four GPU hours on CIFAR-10, and thediscovered model obtains a test error of 2.82\% with only 2.5M parameters,which is on par with the state-of-the-art. Code is publicly available onGitHub: https://github.com/D-X-Y/NAS-Projects.

Quick Read (beta)

loading the full paper ...