Composable Deep Reinforcement Learning for Robotic Manipulation

Abstract

Model-free deep reinforcement learning has been shown to exhibit goodperformance in domains ranging from video games to simulated roboticmanipulation and locomotion. However, model-free methods are known to performpoorly when the interaction time with the environment is limited, as is thecase for most real-world robotic tasks. In this paper, we study how maximumentropy policies trained using soft Q-learning can be applied to real-worldrobotic manipulation. The application of this method to real-world manipulationis facilitated by two important features of soft Q-learning. First, softQ-learning can learn multimodal exploration strategies by learning policiesrepresented by expressive energy-based models. Second, we show that policieslearned with soft Q-learning can be composed to create new policies, and thatthe optimality of the resulting policy can be bounded in terms of thedivergence between the composed policies. This compositionality provides anespecially valuable tool for real-world manipulation, where constructing newpolicies by composing existing skills can provide a large gain in efficiencyover training from scratch. Our experimental evaluation demonstrates that softQ-learning is substantially more sample efficient than prior model-free deepreinforcement learning methods, and that compositionality can be performed forboth simulated and real-world tasks.

Quick Read (beta)

loading the full paper ...