Abstract
Obtaining precise instance segmentation masks is of high importance in manymodern applications such as robotic manipulation and autonomous driving.Currently, many state of the art models are based on the Mask R-CNN frameworkwhich, while very powerful, outputs masks at low resolutions which could resultin imprecise boundaries. On the other hand, classic variational methods forsegmentation impose desirable global and local data and geometry constraints onthe masks by optimizing an energy functional. While mathematically elegant,their direct dependence on good initialization, non-robust image cues andmanual setting of hyperparameters renders them unsuitable for modernapplications. We propose LevelSet R-CNN, which combines the best of both worldsby obtaining powerful feature representations that are combined in anend-to-end manner with a variational segmentation framework. We demonstrate theeffectiveness of our approach on COCO and Cityscapes datasets.