LaMOuR: Leveraging Language Models for Out-of-Distribution Recovery in Reinforcement Learning

Abstract

Deep Reinforcement Learning (DRL) has demonstrated strong performance inrobotic control but remains susceptible to out-of-distribution (OOD) states,often resulting in unreliable actions and task failure. While previous methodshave focused on minimizing or preventing OOD occurrences, they largely neglectrecovery once an agent encounters such states. Although the latest research hasattempted to address this by guiding agents back to in-distribution states,their reliance on uncertainty estimation hinders scalability in complexenvironments. To overcome this limitation, we introduce Language Models forOut-of-Distribution Recovery (LaMOuR), which enables recovery learning withoutrelying on uncertainty estimation. LaMOuR generates dense reward codes thatguide the agent back to a state where it can successfully perform its originaltask, leveraging the capabilities of LVLMs in image description, logicalreasoning, and code generation. Experimental results show that LaMOuRsubstantially enhances recovery efficiency across diverse locomotion tasks andeven generalizes effectively to complex environments, including humanoidlocomotion and mobile manipulation, where existing methods struggle. The codeand supplementary materials are available at https://lamour-rl.github.io/.

Quick Read (beta)

loading the full paper ...