Abstract
Visual reinforcement learning has proven effective in solving control taskswith high-dimensional observations. However, extracting reliable andgeneralizable representations from vision-based observations remains a centralchallenge. Inspired by the human thought process, when the representationextracted from the observation can predict the future and trace history, therepresentation is reliable and accurate in comprehending the environment. Basedon this concept, we introduce a Bidirectional Transition (BiT) model, whichleverages the ability to bidirectionally predict environmental transitions bothforward and backward to extract reliable representations. Our modeldemonstrates competitive generalization performance and sample efficiency ontwo settings of the DeepMind Control suite. Additionally, we utilize roboticmanipulation and CARLA simulators to demonstrate the wide applicability of ourmethod.