This paper shows how to train binary networks to within a few percent points($\sim 3-5 \%$) of the full precision counterpart. We first show how to build astrong baseline, which already achieves state-of-the-art accuracy, by combiningrecently proposed advances and carefully adjusting the optimization procedure.Secondly, we show that by attempting to minimize the discrepancy between theoutput of the binary and the corresponding real-valued convolution, additionalsignificant accuracy gains can be obtained. We materialize this idea in twocomplementary ways: (1) with a loss function, during training, by matching thespatial attention maps computed at the output of the binary and real-valuedconvolutions, and (2) in a data-driven manner, by using the real-valuedactivations, available during inference prior to the binarization process, forre-scaling the activations right after the binary convolution. Finally, we showthat, when putting all of our improvements together, the proposed model beatsthe current state of the art by more than 5% top-1 accuracy on ImageNet andreduces the gap to its real-valued counterpart to less than 3% and 5% top-1accuracy on CIFAR-100 and ImageNet respectively when using a ResNet-18architecture. Code available at https://github.com/brais-martinez/real2binary.