LION: Linear Group RNN for 3D Object Detection in Point Clouds

Abstract

The benefit of transformers in large-scale 3D point cloud perception tasks,such as 3D object detection, is limited by their quadratic computation costwhen modeling long-range relationships. In contrast, linear RNNs have lowcomputational complexity and are suitable for long-range modeling. Toward thisgoal, we propose a simple and effective window-based framework built on LIneargrOup RNN (i.e., perform linear RNN for grouped features) for accurate 3Dobject detection, called LION. The key property is to allow sufficient featureinteraction in a much larger group than transformer-based methods. However,effectively applying linear group RNN to 3D object detection in highly sparsepoint clouds is not trivial due to its limitation in handling spatial modeling.To tackle this problem, we simply introduce a 3D spatial feature descriptor andintegrate it into the linear group RNN operators to enhance their spatialfeatures rather than blindly increasing the number of scanning orders for voxelfeatures. To further address the challenge in highly sparse point clouds, wepropose a 3D voxel generation strategy to densify foreground features thanks tolinear group RNN as a natural property of auto-regressive models. Extensiveexperiments verify the effectiveness of the proposed components and thegeneralization of our LION on different linear group RNN operators includingMamba, RWKV, and RetNet. Furthermore, it is worth mentioning that ourLION-Mamba achieves state-of-the-art on Waymo, nuScenes, Argoverse V2, and ONCEdataset. Last but not least, our method supports kinds of advanced linear RNNoperators (e.g., RetNet, RWKV, Mamba, xLSTM and TTT) on small but popular KITTIdataset for a quick experience with our linear RNN-based framework.

Quick Read (beta)

loading the full paper ...