Accounting for the Sequential Nature of States to Learn Features for Reinforcement Learning

Abstract

In this work, we investigate the properties of data that cause popularrepresentation learning approaches to fail. In particular, we find that inenvironments where states do not significantly overlap, variationalautoencoders (VAEs) fail to learn useful features. We demonstrate this failurein a simple gridworld domain, and then provide a solution in the form of metriclearning. However, metric learning requires supervision in the form of adistance function, which is absent in reinforcement learning. To overcome this,we leverage the sequential nature of states in a replay buffer to approximate adistance metric and provide a weak supervision signal, under the assumptionthat temporally close states are also semantically similar. We modify a VAEwith triplet loss and demonstrate that this approach is able to learn usefulfeatures for downstream tasks, without additional supervision, in environmentswhere standard VAEs fail.

Quick Read (beta)

loading the full paper ...