Pruning is a popular technique for compressing a neural network: a largepre-trained network is fine-tuned while connections are successively removed.However, the value of pruning has largely evaded scrutiny. In this extendedabstract, we examine residual networks obtained through Fisher-pruning and maketwo interesting observations. First, when time-constrained, it is better totrain a simple, smaller network from scratch than prune a large network.Second, it is the architectures obtained through the pruning process --- notthe learnt weights ---that prove valuable. Such architectures are powerful whentrained from scratch. Furthermore, these architectures are easy to approximatewithout any further pruning: we can prune once and obtain a family of new,scalable network architectures for different memory requirements.