Deep neural network based reinforcement learning (RL) can learn appropriatevisual representations for complex tasks like vision-based robotic graspingwithout the need for manually engineering or prior learning a perceptionsystem. However, data for RL is collected via running an agent in the desiredenvironment, and for applications like robotics, running a robot in the realworld may be extremely costly and time consuming. Simulated training offers anappealing alternative, but ensuring that policies trained in simulation cantransfer effectively into the real world requires additional machinery.Simulations may not match reality, and typically bridging thesimulation-to-reality gap requires domain knowledge and task-specificengineering. We can automate this process by employing generative models totranslate simulated images into realistic ones. However, this sort oftranslation is typically task-agnostic, in that the translated images may notpreserve all features that are relevant to the task. In this paper, weintroduce the RL-scene consistency loss for image translation, which ensuresthat the translation operation is invariant with respect to the Q-valuesassociated with the image. This allows us to learn a task-aware translation.Incorporating this loss into unsupervised domain translation, we obtainRL-CycleGAN, a new approach for simulation-to-real-world transfer forreinforcement learning. In evaluations of RL-CycleGAN on two vision-basedrobotics grasping tasks, we show that RL-CycleGAN offers a substantialimprovement over a number of prior methods for sim-to-real transfer, attainingexcellent real-world performance with only a modest number of real-worldobservations.