Can LLMs Translate Human Instructions into a Reinforcement Learning Agent's Internal Emergent Symbolic Representation?

Abstract

Emergent symbolic representations are critical for enabling developmentallearning agents to plan and generalize across tasks. In this work, weinvestigate whether large language models (LLMs) can translate human naturallanguage instructions into the internal symbolic representations that emergeduring hierarchical reinforcement learning. We apply a structured evaluationframework to measure the translation performance of commonly seen LLMs -- GPT,Claude, Deepseek and Grok -- across different internal symbolic partitionsgenerated by a hierarchical reinforcement learning algorithm in the Ant Mazeand Ant Fall environments. Our findings reveal that although LLMs demonstratesome ability to translate natural language into a symbolic representation ofthe environment dynamics, their performance is highly sensitive to partitiongranularity and task complexity. The results expose limitations in current LLMscapacity for representation alignment, highlighting the need for furtherresearch on robust alignment between language and internal agentrepresentations.

Quick Read (beta)

loading the full paper ...