Abstract
Monocular geometric scene understanding combines panoptic segmentation andself-supervised depth estimation, focusing on real-time application inautonomous vehicles. We introduce MGNiceNet, a unified approach that uses alinked kernel formulation for panoptic segmentation and self-supervised depthestimation. MGNiceNet is based on the state-of-the-art real-time panopticsegmentation method RT-K-Net and extends the architecture to cover bothpanoptic segmentation and self-supervised monocular depth estimation. To thisend, we introduce a tightly coupled self-supervised depth estimation predictorthat explicitly uses information from the panoptic path for depth prediction.Furthermore, we introduce a panoptic-guided motion masking method to improvedepth estimation without relying on video panoptic segmentation annotations. Weevaluate our method on two popular autonomous driving datasets, Cityscapes andKITTI. Our model shows state-of-the-art results compared to other real-timemethods and closes the gap to computationally more demanding methods. Sourcecode and trained models are available athttps://github.com/markusschoen/MGNiceNet.