Abstract
Neural volumetric representations have shown the potential that Multi-layerPerceptrons (MLPs) can be optimized with multi-view calibrated images torepresent scene geometry and appearance, without explicit 3D supervision.Object segmentation can enrich many downstream applications based on thelearned radiance field. However, introducing hand-crafted segmentation todefine regions of interest in a complex real-world scene is non-trivial andexpensive as it acquires per view annotation. This paper carries out theexploration of self-supervised learning for object segmentation using NeRF forcomplex real-world scenes. Our framework, called NeRF with Self-supervisedObject Segmentation NeRF-SOS, couples object segmentation and neural radiancefield to segment objects in any view within a scene. By proposing a novelcollaborative contrastive loss in both appearance and geometry levels, NeRF-SOSencourages NeRF models to distill compact geometry-aware segmentation clustersfrom their density fields and the self-supervised pre-trained 2D visualfeatures. The self-supervised object segmentation framework can be applied tovarious NeRF models that both lead to photo-realistic rendering results andconvincing segmentation maps for both indoor and outdoor scenarios. Extensiveresults on the LLFF, Tank & Temple, and BlendedMVS datasets validate theeffectiveness of NeRF-SOS. It consistently surpasses other 2D-basedself-supervised baselines and predicts finer semantics masks than existingsupervised counterparts. Code is available at:https://github.com/VITA-Group/NeRF-SOS.