AACC: Asymmetric Actor-Critic in Contextual Reinforcement Learning

Abstract

Reinforcement Learning (RL) techniques have drawn great attention in manychallenging tasks, but their performance deteriorates dramatically when appliedto real-world problems. Various methods, such as domain randomization, havebeen proposed to deal with such situations by training agents under differentenvironmental setups, and therefore they can be generalized to differentenvironments during deployment. However, they usually do not incorporate theunderlying environmental factor information that the agents interact withproperly and thus can be overly conservative when facing changes in thesurroundings. In this paper, we first formalize the task of adapting tochanging environmental dynamics in RL as a generalization problem usingContextual Markov Decision Processes (CMDPs). We then propose the AsymmetricActor-Critic in Contextual RL (AACC) as an end-to-end actor-critic method todeal with such generalization tasks. We demonstrate the essential improvementsin the performance of AACC over existing baselines experimentally in a range ofsimulated environments.

Quick Read (beta)

loading the full paper ...