Mask2Map: Vectorized HD Map Construction Using Bird's Eye View Segmentation Masks

Abstract

In this paper, we introduce Mask2Map, a novel end-to-end online HD mapconstruction method designed for autonomous driving applications. Our approachfocuses on predicting the class and ordered point set of map instances within ascene, represented in the bird's eye view (BEV). Mask2Map consists of twoprimary components: the Instance-Level Mask Prediction Network (IMPNet) and theMask-Driven Map Prediction Network (MMPNet). IMPNet generates Mask-AwareQueries and BEV Segmentation Masks to capture comprehensive semanticinformation globally. Subsequently, MMPNet enhances these query features usinglocal contextual information through two submodules: the Positional QueryGenerator (PQG) and the Geometric Feature Extractor (GFE). PQG extractsinstance-level positional queries by embedding BEV positional information intoMask-Aware Queries, while GFE utilizes BEV Segmentation Masks to generatepoint-level geometric features. However, we observed limited performance inMask2Map due to inter-network inconsistency stemming from different predictionsto Ground Truth (GT) matching between IMPNet and MMPNet. To tackle thischallenge, we propose the Inter-network Denoising Training method, which guidesthe model to denoise the output affected by both noisy GT queries and perturbedGT Segmentation Masks. Our evaluation conducted on nuScenes and Argoverse2benchmarks demonstrates that Mask2Map achieves remarkable performanceimprovements over previous state-of-the-art methods, with gains of 10.1% mAPand 4.1 mAP, respectively. Our code can be found athttps://github.com/SehwanChoi0307/Mask2Map.

Quick Read (beta)

loading the full paper ...