Abstract
We investigate whether Deep Reinforcement Learning (Deep RL) is able tosynthesize sophisticated and safe movement skills for a low-cost, miniaturehumanoid robot that can be composed into complex behavioral strategies indynamic environments. We used Deep RL to train a humanoid robot with 20actuated joints to play a simplified one-versus-one (1v1) soccer game. Theresulting agent exhibits robust and dynamic movement skills such as rapid fallrecovery, walking, turning, kicking and more; and it transitions between themin a smooth, stable, and efficient manner. The agent's locomotion and tacticalbehavior adapts to specific game contexts in a way that would be impractical tomanually design. The agent also developed a basic strategic understanding ofthe game, and learned, for instance, to anticipate ball movements and to blockopponent shots. Our agent was trained in simulation and transferred to realrobots zero-shot. We found that a combination of sufficiently high-frequencycontrol, targeted dynamics randomization, and perturbations during training insimulation enabled good-quality transfer. Although the robots are inherentlyfragile, basic regularization of the behavior during training led the robots tolearn safe and effective movements while still performing in a dynamic andagile way -- well beyond what is intuitively expected from the robot. Indeed,in experiments, they walked 181% faster, turned 302% faster, took 63% less timeto get up, and kicked a ball 34% faster than a scripted baseline, whileefficiently combining the skills to achieve the longer term objectives.