Will it Blend? Composing Value Functions in Reinforcement Learning

Abstract

An important property for lifelong-learning agents is the ability to combineexisting skills to solve unseen tasks. In general, however, it is unclear howto compose skills in a principled way. We provide a "recipe" for optimal valuefunction composition in entropy-regularised reinforcement learning (RL) andthen extend this to the standard RL setting. Composition is demonstrated in avideo game environment, where an agent with an existing library of policies isable to solve new tasks without the need for further learning.

Quick Read (beta)

loading the full paper ...