HERAKLES: Hierarchical Skill Compilation for Open-ended LLM Agents

Abstract

Open-ended AI agents need to be able to learn efficiently goals of increasingcomplexity, abstraction and heterogeneity over their lifetime. Beyond samplingefficiently their own goals, autotelic agents specifically need to be able tokeep the growing complexity of goals under control, limiting the associatedgrowth in sample and computational complexity. To adress this challenge, recentapproaches have leveraged hierarchical reinforcement learning (HRL) andlanguage, capitalizing on its compositional and combinatorial generalizationcapabilities to acquire temporally extended reusable behaviours. Existingapproaches use expert defined spaces of subgoals over which they instantiate ahierarchy, and often assume pre-trained associated low-level policies. Suchdesigns are inadequate in open-ended scenarios, where goal spaces naturallydiversify across a broad spectrum of difficulties. We introduce HERAKLES, aframework that enables a two-level hierarchical autotelic agent to continuouslycompile mastered goals into the low-level policy, executed by a small, fastneural network, dynamically expanding the set of subgoals available to thehigh-level policy. We train a Large Language Model (LLM) to serve as thehigh-level controller, exploiting its strengths in goal decomposition andgeneralization to operate effectively over this evolving subgoal space. Weevaluate HERAKLES in the open-ended Crafter environment and show that it scaleseffectively with goal complexity, improves sample efficiency through skillcompilation, and enables the agent to adapt robustly to novel challenges overtime.

Quick Read (beta)

loading the full paper ...