Context-aware Active Multi-Step Reinforcement Learning

Abstract

Reinforcement learning has attracted great attention recently, especiallypolicy gradient algorithms, which have been demonstrated on challengingdecision making and control tasks. In this paper, we propose an activemulti-step TD algorithm with adaptive stepsizes to learn actor and critic.Specifically, our model consists of two components: active stepsize learningand adaptive multi-step TD algorithm. Firstly, we divide the time horizon intochunks and actively select state and action inside each chunk. Then given theselected samples, we propose the adaptive multi-step TD, which generalizesTD($\lambda$), but adaptively switch on/off the backups from future returns ofdifferent steps. Particularly, the adaptive multi-step TD introduces acontext-aware mechanism, here a binary classifier, which decides whether or notto turn on its future backups based on the context changes. Thus, our model iskind of combination of active learning and multi-step TD algorithm, which hasthe capacity for learning off-policy without the need of importance sampling.We evaluate our approach on both discrete and continuous space tasks in anoff-policy setting respectively, and demonstrate competitive results comparedto other reinforcement learning baselines.

Quick Read (beta)

loading the full paper ...