Abstract
Bird's-Eye View (BEV) Perception has received increasing attention in recentyears as it provides a concise and unified spatial representation across viewsand benefits a diverse set of downstream driving applications. While the focushas been placed on discriminative tasks such as BEV segmentation, the dualgenerative task of creating street-view images from a BEV layout has rarelybeen explored. The ability to generate realistic street-view images that alignwith a given HD map and traffic layout is critical for visualizing complextraffic scenarios and developing robust perception models for autonomousdriving. In this paper, we propose BEVGen, a conditional generative model thatsynthesizes a set of realistic and spatially consistent surrounding images thatmatch the BEV layout of a traffic scenario. BEVGen incorporates a novelcross-view transformation and spatial attention design which learn therelationship between cameras and map views to ensure their consistency. Ourmodel can accurately render road and lane lines, as well as generate trafficscenes under different weather conditions and times of day. The code will bemade publicly available.