Point-In-Context: Understanding Point Cloud via In-Context Learning

Abstract

With the emergence of large-scale models trained on diverse datasets,in-context learning has emerged as a promising paradigm for multitasking,notably in natural language processing and image processing. However, itsapplication in 3D point cloud tasks remains largely unexplored. In this work,we introduce Point-In-Context (PIC), a novel framework for 3D point cloudunderstanding via in-context learning. We address the technical challenge ofeffectively extending masked point modeling to 3D point clouds by introducing aJoint Sampling module and proposing a vanilla version of PIC calledPoint-In-Context-Generalist (PIC-G). PIC-G is designed as a generalist modelfor various 3D point cloud tasks, with inputs and outputs modeled ascoordinates. In this paradigm, the challenging segmentation task is achieved byassigning label points with XYZ coordinates for each category; the finalprediction is then chosen based on the label point closest to the predictions.To break the limitation by the fixed label-coordinate assignment, which haspoor generalization upon novel classes, we propose two novel trainingstrategies, In-Context Labeling and In-Context Enhancing, forming an extendedversion of PIC named Point-In-Context-Segmenter (PIC-S), targeting improvingdynamic context labeling and model training. By utilizing dynamic in-contextlabels and extra in-context pairs, PIC-S achieves enhanced performance andgeneralization capability in and across part segmentation datasets. PIC is ageneral framework so that other tasks or datasets can be seamlessly introducedinto our PIC through a unified data format. We conduct extensive experiments tovalidate the versatility and adaptability of our proposed methods in handling awide range of tasks and segmenting multi-datasets. Our PIC-S is capable ofgeneralizing unseen datasets and performing novel part segmentation bycustomizing prompts.

Quick Read (beta)

loading the full paper ...