Latent Reward: LLM-Empowered Credit Assignment in Episodic Reinforcement Learning

Abstract

Reinforcement learning (RL) often encounters delayed and sparse feedback inreal-world applications, even with only episodic rewards. Previous approacheshave made some progress in reward redistribution for credit assignment butstill face challenges, including training difficulties due to redundancy andambiguous attributions stemming from overlooking the multifaceted nature ofmission performance evaluation. Hopefully, Large Language Model (LLM)encompasses fruitful decision-making knowledge and provides a plausible toolfor reward redistribution. Even so, deploying LLM in this case is non-trivialdue to the misalignment between linguistic knowledge and the symbolic formrequirement, together with inherent randomness and hallucinations in inference.To tackle these issues, we introduce LaRe, a novel LLM-empowered symbolic-baseddecision-making framework, to improve credit assignment. Key to LaRe is theconcept of the Latent Reward, which works as a multi-dimensional performanceevaluation, enabling more interpretable goal attainment from variousperspectives and facilitating more effective reward redistribution. We examinethat semantically generated code from LLM can bridge linguistic knowledge andsymbolic latent rewards, as it is executable for symbolic objects. Meanwhile,we design latent reward self-verification to increase the stability andreliability of LLM inference. Theoretically, reward-irrelevant redundancyelimination in the latent reward benefits RL performance from more accuratereward estimation. Extensive experimental results witness that LaRe (i)achieves superior temporal credit assignment to SOTA methods, (ii) excels inallocating contributions among multiple agents, and (iii) outperforms policiestrained with ground truth rewards for certain tasks.

Quick Read (beta)

loading the full paper ...