RenderWorld: World Model with Self-Supervised 3D Label

  • 2024-09-17 18:00:52
  • Ziyang Yan, Wenzhen Dong, Yihua Shao, Yuhang Lu, Liu Haiyang, Jingwen Liu, Haozhe Wang, Zhe Wang, Yan Wang, Fabio Remondino, Yuexin Ma
  • 0

Abstract

End-to-end autonomous driving with vision-only is not only morecost-effective compared to LiDAR-vision fusion but also more reliable thantraditional methods. To achieve a economical and robust purely visualautonomous driving system, we propose RenderWorld, a vision-only end-to-endautonomous driving framework, which generates 3D occupancy labels using aself-supervised gaussian-based Img2Occ Module, then encodes the labels byAM-VAE, and uses world model for forecasting and planning. RenderWorld employsGaussian Splatting to represent 3D scenes and render 2D images greatly improvessegmentation accuracy and reduces GPU memory consumption compared withNeRF-based methods. By applying AM-VAE to encode air and non-air separately,RenderWorld achieves more fine-grained scene element representation, leading tostate-of-the-art performance in both 4D occupancy forecasting and motionplanning from autoregressive world model.

 

Quick Read (beta)

loading the full paper ...