Abstract
Reinforcement learning plays a crucial role in vehicle control by guidingagents to learn optimal control strategies through designing or learningappropriate reward signals. However, in vehicle control applications, rewardstypically need to be manually designed while considering multiple implicitfactors, which easily introduces human biases. Although imitation learningmethods does not rely on explicit reward signals, they necessitate high-qualityexpert actions, which are often challenging to acquire. To address theseissues, we propose a reward-free reinforcement learning framework (RFRLF). Thisframework directly learns the target states to optimize agent behavior througha target state prediction network (TSPN) and a reward-free state-guided policynetwork (RFSGPN), avoiding the dependence on manually designed reward signals.Specifically, the policy network is learned via minimizing the differencesbetween the predicted state and the expert state. Experimental resultsdemonstrate the effectiveness of the proposed RFRLF in controlling vehicledriving, showing its advantages in improving learning efficiency and adaptingto reward-free environments.