Abstract
Recently, the field of deep learning has received great attention by thescientific community and it is used to provide improved solutions to manycomputer vision problems. Convolutional neural networks (CNNs) have beensuccessfully used to attack problems such as object recognition, objectdetection, semantic segmentation, and scene understanding. The rapiddevelopment of deep learning goes hand by hand with the adaptation of GPUs foraccelerating its processes, such as network training and inference. Even thoughFPGA design exists long before the use of GPUs for accelerating computationsand despite the fact that high-level synthesis (HLS) tools are getting moreattractive, the adaptation of FPGAs for deep learning research and applicationdevelopment is poor due to the requirement of hardware design relatedexpertise. This work presents a workflow for deep learning mobile applicationacceleration on small low-cost low-power FPGA devices using HLS tools. Thisworkflow eases the design of an improved version of the SqueezeJet acceleratorused for the speedup of mobile-friendly low-parameter ImageNet class CNNs, suchas the SqueezeNet v1.1 and the ZynqNet. Additionally, the workflow includes thedevelopment of an HLS-driven analytical model which is used for performanceestimation of the accelerator. This model can be also used to direct the designprocess and lead to future design improvements and optimizations.