AGILE3D: Attention Guided Interactive Multi-object 3D Segmentation

Abstract

During interactive segmentation, a model and a user work together todelineate objects of interest in a 3D point cloud. In an iterative process, themodel assigns each data point to an object (or the background), while the usercorrects errors in the resulting segmentation and feeds them back into themodel. From a machine learning perspective the goal is to design the model andthe feedback mechanism in a way that minimizes the required user input. Thecurrent best practice segments objects one at a time, and asks the user toprovide positive clicks to indicate regions wrongly assigned to the backgroundand negative clicks to indicate regions wrongly assigned to the object(foreground). Sequentially visiting objects is wasteful, since it disregardssynergies between objects: a positive click for a given object can, bydefinition, serve as a negative click for nearby objects, moreover a directcompetition between adjacent objects can speed up the identification of theircommon boundary. We introduce AGILE3D, an efficient, attention-based model that(1) supports simultaneous segmentation of multiple 3D objects, (2) yields moreaccurate segmentation masks with fewer user clicks, and (3) offers fasterinference. We encode the point cloud into a latent feature representation, andview user clicks as queries and employ cross-attention to represent contextualrelations between different click locations as well as between clicks and the3D point cloud features. Every time new clicks are added, we only need to run alightweight decoder that produces updated segmentation masks. In experimentswith four different point cloud datasets, AGILE3D sets a new state of the art,moreover, we also verify its practicality in real-world setups with a real userstudy.

Quick Read (beta)

loading the full paper ...