In this paper, we propose a multi-task convolutional neural network (CNN)architecture optimized for a low power automotive grade SoC. We introduce anetwork based on a unified architecture where the encoder is shared among thetwo tasks namely detection and segmentation. The pro-posed network runs at25FPS for 1280x800 resolution. We briefly discuss the methods used to optimizethe network architecture such as using native YUV image directly, optimizationof layers & feature maps and applying quantization. We also focus on memorybandwidth in our design as convolutions are data intensives and most SOCs arebandwidth bottlenecked. We then demonstrate the efficiency of our proposednetwork for a dedicated CNN accelerators presenting the key performanceindicators (KPI) for the detection and segmentation tasks obtained from thehardware execution and the corresponding run-time.