Abstract
In the field of autonomous driving, developing safe and trustworthyautonomous driving policies remains a significant challenge. Recently,Reinforcement Learning with Human Feedback (RLHF) has attracted substantialattention due to its potential to enhance training safety and samplingefficiency. Nevertheless, existing RLHF-enabled methods often falter when facedwith imperfect human demonstrations, potentially leading to trainingoscillations or even worse performance than rule-based approaches. Inspired bythe human learning process, we propose Physics-enhanced Reinforcement Learningwith Human Feedback (PE-RLHF). This novel framework synergistically integrateshuman feedback (e.g., human intervention and demonstration) and physicsknowledge (e.g., traffic flow model) into the training loop of reinforcementlearning. The key advantage of PE-RLHF is its guarantee that the learned policywill perform at least as well as the given physics-based policy, even whenhuman feedback quality deteriorates, thus ensuring trustworthy safetyimprovements. PE-RLHF introduces a Physics-enhanced Human-AI (PE-HAI)collaborative paradigm for dynamic action selection between human andphysics-based actions, employs a reward-free approach with a proxy valuefunction to capture human preferences, and incorporates a minimal interventionmechanism to reduce the cognitive load on human mentors. Extensive experimentsacross diverse driving scenarios demonstrate that PE-RLHF significantlyoutperforms traditional methods, achieving state-of-the-art (SOTA) performancein safety, efficiency, and generalizability, even with varying quality of humanfeedback. The philosophy behind PE-RLHF not only advances autonomous drivingtechnology but can also offer valuable insights for other safety-criticaldomains. Demo video and code are available at:\https://zilin-huang.github.io/PE-RLHF-website/