Deep neural networks have usually to be compressed and accelerated for theirusage in low-power, e.g. mobile, devices. Recently, massively-parallel hardwareaccelerators were developed that offer high throughput and low latency at lowpower by utilizing in-memory computation. However, to exploit these benefitsthe computational graph of a neural network has to fit into the in-computationmemory of these hardware systems that is usually rather limited in size. Inthis study, we introduce a class of network models that have a small memoryfootprint in terms of their computational graphs. To this end, the graph isdesigned to contain loops by iteratively executing a single network buildingblock. Furthermore, the trade-off between accuracy and latency of theseso-called iterative neural networks is improved by adding multiple intermediateoutputs during both training and inference. We show state-of-the-art resultsfor semantic segmentation on the CamVid and Cityscapes datasets that areespecially demanding in terms of computational resources. In ablation studies,the improvement of network training by intermediate network outputs as well asthe trade-off between weight sharing over iterations and the network size areinvestigated.