Abstract
High-definition (HD) map provides abundant and precise static environmentalinformation of the driving scene, serving as a fundamental and indispensablecomponent for planning in autonomous driving system. In this paper, we present\textbf{Map} \textbf{TR}ansformer, an end-to-end framework for onlinevectorized HD map construction. We propose a unified permutation-equivalentmodeling approach, \ie, modeling map element as a point set with a group ofequivalent permutations, which accurately describes the shape of map elementand stabilizes the learning process. We design a hierarchical query embeddingscheme to flexibly encode structured map information and perform hierarchicalbipartite matching for map element learning. To speed up convergence, wefurther introduce auxiliary one-to-many matching and dense supervision. Theproposed method well copes with various map elements with arbitrary shapes. Itruns at real-time inference speed and achieves state-of-the-art performance onboth nuScenes and Argoverse2 datasets. Abundant qualitative results show stableand robust map construction quality in complex and various driving scenes. Codeand more demos are available at \url{https://github.com/hustvl/MapTR} forfacilitating further studies and applications.