Reinforcement learning has attracted great attention recently, especiallypolicy gradient algorithms, which have been demonstrated on challengingdecision making and control tasks. In this paper, we propose an activemulti-step TD algorithm with adaptive stepsizes to learn actor and critic.Specifically, our model consists of two components: active stepsize learningand adaptive multi-step TD algorithm. Firstly, we divide the time horizon intochunks and actively select state and action inside each chunk. Then given theselected samples, we propose the adaptive multi-step TD, which generalizesTD($\lambda$), but adaptively switch on/off the backups from future returns ofdifferent steps. Particularly, the adaptive multi-step TD introduces acontext-aware mechanism, here a binary classifier, which decides whether or notto turn on its future backups based on the context changes. Thus, our model iskind of combination of active learning and multi-step TD algorithm, which hasthe capacity for learning off-policy without the need of importance sampling.We evaluate our approach on both discrete and continuous space tasks in anoff-policy setting respectively, and demonstrate competitive results comparedto other reinforcement learning baselines.