L-SA: Learning Under-Explored Targets in Multi-Target Reinforcement Learning

Abstract

Tasks that involve interaction with various targets are called multi-targettasks. When applying general reinforcement learning approaches for such tasks,certain targets that are difficult to access or interact with may be neglectedthroughout the course of training - a predicament we call Under-explored TargetProblem (UTP). To address this problem, we propose L-SA (Learning by adaptiveSampling and Active querying) framework that includes adaptive sampling andactive querying. In the L-SA framework, adaptive sampling dynamically samplestargets with the highest increase of success rates at a high proportion,resulting in curricular learning from easy to hard targets. Active queryingprompts the agent to interact more frequently with under-explored targets thatneed more experience or exploration. Our experimental results on visualnavigation tasks show that the L-SA framework improves sample efficiency aswell as success rates on various multi-target tasks with UTP. Also, it isexperimentally demonstrated that the cyclic relationship between adaptivesampling and active querying effectively improves the sample richness ofunder-explored targets and alleviates UTP.

Quick Read (beta)

loading the full paper ...