QuadricFormer: Scene as Superquadrics for 3D Semantic Occupancy Prediction

Abstract

3D occupancy prediction is crucial for robust autonomous driving systems asit enables comprehensive perception of environmental structures and semantics.Most existing methods employ dense voxel-based scene representations, ignoringthe sparsity of driving scenes and resulting in inefficiency. Recent worksexplore object-centric representations based on sparse Gaussians, but theirellipsoidal shape prior limits the modeling of diverse structures. Inreal-world driving scenes, objects exhibit rich geometries (e.g., cuboids,cylinders, and irregular shapes), necessitating excessive ellipsoidal Gaussiansdensely packed for accurate modeling, which leads to inefficientrepresentations. To address this, we propose to use geometrically expressivesuperquadrics as scene primitives, enabling efficient representation of complexstructures with fewer primitives through their inherent shape diversity. Wedevelop a probabilistic superquadric mixture model, which interprets eachsuperquadric as an occupancy probability distribution with a correspondinggeometry prior, and calculates semantics through probabilistic mixture.Building on this, we present QuadricFormer, a superquadric-based model forefficient 3D occupancy prediction, and introduce a pruning-and-splitting moduleto further enhance modeling efficiency by concentrating superquadrics inoccupied regions. Extensive experiments on the nuScenes dataset demonstratethat QuadricFormer achieves state-of-the-art performance while maintainingsuperior efficiency.

Quick Read (beta)

loading the full paper ...