Abstract
Model-based reinforcement learning (RL) enjoys several benefits, such asdata-efficiency and planning, by learning a model of the environment'sdynamics. However, learning a global model that can generalize across differentdynamics is a challenging task. To tackle this problem, we decompose the taskof learning a global dynamics model into two stages: (a) learning a contextlatent vector that captures the local dynamics, then (b) predicting the nextstate conditioned on it. In order to encode dynamics-specific information intothe context latent vector, we introduce a novel loss function that encouragesthe context latent vector to be useful for predicting both forward and backwarddynamics. The proposed method achieves superior generalization ability acrossvarious simulated robotics and control tasks, compared to existing RL schemes.