Reinforcement Learning with Foundation Priors: Let the Embodied Agent Efficiently Learn on Its Own

Abstract

Reinforcement learning (RL) is a promising approach for solving roboticmanipulation tasks. However, it is challenging to apply the RL algorithmsdirectly in the real world. For one thing, RL is data-intensive and typicallyrequires millions of interactions with environments, which are impractical inreal scenarios. For another, it is necessary to make heavy engineering effortsto design reward functions manually. To address these issues, we leveragefoundation models in this paper. We propose Reinforcement Learning withFoundation Priors (RLFP) to utilize guidance and feedback from policy, value,and success-reward foundation models. Within this framework, we introduce theFoundation-guided Actor-Critic (FAC) algorithm, which enables embodied agentsto explore more efficiently with automatic reward functions. The benefits ofour framework are threefold: (1) \textit{sample efficient}; (2) \textit{minimaland effective reward engineering}; (3) \textit{agnostic to foundation modelforms and robust to noisy priors}. Our method achieves remarkable performancesin various manipulation tasks on both real robots and in simulation. Across 5dexterous tasks with real robots, FAC achieves an average success rate of 86\%after one hour of real-time learning. Across 8 tasks in the simulatedMeta-world, FAC achieves 100\% success rates in 7/8 tasks under less than 100kframes (about 1-hour training), outperforming baseline methods withmanual-designed rewards in 1M frames. We believe the RLFP framework can enablefuture robots to explore and learn autonomously in the physical world for moretasks. Visualizations and code are available at\url{https://yewr.github.io/rlfp}.

Quick Read (beta)

loading the full paper ...