PRISM: Preference Refinement via Implicit Scene Modeling for 3D Vision-Language Preference-Based Reinforcement Learning

Abstract

We propose PRISM, a novel framework designed to overcome the limitations of2D-based Preference-Based Reinforcement Learning (PBRL) by unifying 3D pointcloud modeling and future-aware preference refinement. At its core, PRISMadopts a 3D Point Cloud-Language Model (3D-PC-LLM) to mitigate occlusion andviewpoint biases, ensuring more stable and spatially consistent preferencesignals. Additionally, PRISM leverages Chain-of-Thought (CoT) reasoning toincorporate long-horizon considerations, thereby preventing the short-sightedfeedback often seen in static preference comparisons. In contrast toconventional PBRL techniques, this integration of 3D perception andfuture-oriented reasoning leads to significant gains in preference agreementrates, faster policy convergence, and robust generalization across unseenrobotic environments. Our empirical results, spanning tasks such as roboticmanipulation and autonomous navigation, highlight PRISM's potential forreal-world applications where precise spatial understanding and reliablelong-term decision-making are critical. By bridging 3D geometric awareness withCoT-driven preference modeling, PRISM establishes a comprehensive foundationfor scalable, human-aligned reinforcement learning.

Quick Read (beta)

loading the full paper ...