SAM2Point: Segment Any 3D as Videos in Zero-shot and Promptable Manners

Abstract

We introduce SAM2Point, a preliminary exploration adapting Segment AnythingModel 2 (SAM 2) for zero-shot and promptable 3D segmentation. SAM2Pointinterprets any 3D data as a series of multi-directional videos, and leveragesSAM 2 for 3D-space segmentation, without further training or 2D-3D projection.Our framework supports various prompt types, including 3D points, boxes, andmasks, and can generalize across diverse scenarios, such as 3D objects, indoorscenes, outdoor environments, and raw sparse LiDAR. Demonstrations on multiple3D datasets, e.g., Objaverse, S3DIS, ScanNet, Semantic3D, and KITTI, highlightthe robust generalization capabilities of SAM2Point. To our best knowledge, wepresent the most faithful implementation of SAM in 3D, which may serve as astarting point for future research in promptable 3D segmentation. Online Demo:https://huggingface.co/spaces/ZiyuG/SAM2Point . Code:https://github.com/ZiyuGuo99/SAM2Point .

Quick Read (beta)

loading the full paper ...