Can LVLMs Obtain a Driver's License? A Benchmark Towards Reliable AGI for Autonomous Driving

Abstract

Large Vision-Language Models (LVLMs) have recently garnered significantattention, with many efforts aimed at harnessing their general knowledge toenhance the interpretability and robustness of autonomous driving models.However, LVLMs typically rely on large, general-purpose datasets and lack thespecialized expertise required for professional and safe driving. Existingvision-language driving datasets focus primarily on scene understanding anddecision-making, without providing explicit guidance on traffic rules anddriving skills, which are critical aspects directly related to driving safety.To bridge this gap, we propose IDKB, a large-scale dataset containing over onemillion data items collected from various countries, including drivinghandbooks, theory test data, and simulated road test data. Much like theprocess of obtaining a driver's license, IDKB encompasses nearly all theexplicit knowledge needed for driving from theory to practice. In particular,we conducted comprehensive tests on 15 LVLMs using IDKB to assess theirreliability in the context of autonomous driving and provided extensiveanalysis. We also fine-tuned popular models, achieving notable performanceimprovements, which further validate the significance of our dataset. Theproject page can be found at:\url{https://4dvlab.github.io/project_page/idkb.html}

Quick Read (beta)

loading the full paper ...