LaViPlan : Language-Guided Visual Path Planning with RLVR

Abstract

Out-of-distribution (OOD) scenarios in autonomous driving refer to situationsthat deviate from the training domain, often leading to unexpected andpotentially hazardous behavior from planners that lack prior exposure to suchcases. Recently, Vision-Language Models (VLMs) have been introduced intoautonomous driving research for their promising generalization capabilities inOOD settings. Early studies demonstrated that VLMs could recognize OODscenarios and generate user-level decisions such as "go straight" or "turnright." However, a new challenge has emerged due to the misalignment betweenthe VLM's high-level decisions or visual reasoning expressed in language, andthe low-level predicted trajectories interpreted as actions. In this paper, wepropose LaViPlan, a framework that leverages Reinforcement Learning withVerifiable Rewards (RLVR) to optimize VLMs using planning-oriented metrics.This approach addresses the vision-language-action misalignment observed inexisting VLMs fine-tuned via supervised learning, which can recognize drivingscenarios but often produce context-unaware decisions. Experimental resultsdemonstrate that our method improves situational awareness and decision-makingunder OOD conditions, highlighting its potential to mitigate the misalignmentissue. This work introduces a promising post-training paradigm for VLM agentsin the context of autonomous driving.

Quick Read (beta)

loading the full paper ...