PixL2R: Guiding Reinforcement Learning Using Natural Language by Mapping Pixels to Rewards

Abstract

Reinforcement learning (RL), particularly in sparse reward settings, oftenrequires prohibitively large numbers of interactions with the environment,thereby limiting its applicability to complex problems. To address this,several prior approaches have used natural language to guide the agent'sexploration. However, these approaches typically operate on structuredrepresentations of the environment, and/or assume some structure in the naturallanguage commands. In this work, we propose a model that directly maps pixelsto rewards, given a free-form natural language description of the task, whichcan then be used for policy learning. Our experiments on the Meta-World robotmanipulation domain show that language-based rewards significantly improves thesample efficiency of policy learning, both in sparse and dense reward settings.

Quick Read (beta)

loading the full paper ...