Self-Refined Large Language Model as Automated Reward Function Designer for Deep Reinforcement Learning in Robotics

Abstract

Although Deep Reinforcement Learning (DRL) has achieved notable success innumerous robotic applications, designing a high-performing reward functionremains a challenging task that often requires substantial manual input.Recently, Large Language Models (LLMs) have been extensively adopted to addresstasks demanding in-depth common-sense knowledge, such as reasoning andplanning. Recognizing that reward function design is also inherently linked tosuch knowledge, LLM offers a promising potential in this context. Motivated bythis, we propose in this work a novel LLM framework with a self-refinementmechanism for automated reward function design. The framework commences withthe LLM formulating an initial reward function based on natural languageinputs. Then, the performance of the reward function is assessed, and theresults are presented back to the LLM for guiding its self-refinement process.We examine the performance of our proposed framework through a variety ofcontinuous robotic control tasks across three diverse robotic systems. Theresults indicate that our LLM-designed reward functions are able to rival oreven surpass manually designed reward functions, highlighting the efficacy andapplicability of our approach.

Quick Read (beta)

loading the full paper ...