Towards Adapting Reinforcement Learning Agents to New Tasks: Insights from Q-Values

Abstract

While contemporary reinforcement learning research and applications haveembraced policy gradient methods as the panacea of solving learning problems,value-based methods can still be useful in many domains as long as we canwrangle with how to exploit them in a sample efficient way. In this paper, weexplore the chaotic nature of DQNs in reinforcement learning, whileunderstanding how the information that they retain when trained can berepurposed for adapting a model to different tasks. We start by designing asimple experiment in which we are able to observe the Q-values for each stateand action in an environment. Then we train in eight different ways to explorehow these training algorithms affect the way that accurate Q-values are learned(or not learned). We tested the adaptability of each trained model whenretrained to accomplish a slightly modified task. We then scaled our setup totest the larger problem of an autonomous vehicle at an unprotectedintersection. We observed that the model is able to adapt to new tasks quickerwhen the base model's Q-value estimates are closer to the true Q-values. Theresults provide some insights and guidelines into what algorithms are usefulfor sample efficient task adaptation.

Quick Read (beta)

loading the full paper ...