Abstract
We propose CAL (Complete Anything in Lidar) for Lidar-based shape-completionin-the-wild. This is closely related to Lidar-based semantic/panoptic scenecompletion. However, contemporary methods can only complete and recognizeobjects from a closed vocabulary labeled in existing Lidar datasets. Differentto that, our zero-shot approach leverages the temporal context from multi-modalsensor sequences to mine object shapes and semantic features of observedobjects. These are then distilled into a Lidar-only instance-level completionand recognition model. Although we only mine partial shape completions, we findthat our distilled model learns to infer full object shapes from multiple suchpartial observations across the dataset. We show that our model can be promptedon standard benchmarks for Semantic and Panoptic Scene Completion, localizeobjects as (amodal) 3D bounding boxes, and recognize objects beyond fixed classvocabularies. Our project page ishttps://research.nvidia.com/labs/dvl/projects/complete-anything-lidar