Jointly Learning to Construct and Control Agents using Deep Reinforcement Learning

Abstract

The physical design of a robot and the policy that controls its motion areinherently coupled, and should be determined according to the task andenvironment. In an increasing number of applications, data-driven andlearning-based approaches, such as deep reinforcement learning, have proveneffective at designing control policies. For most tasks, the only way toevaluate a physical design with respect to such control policies isempirical--i.e., by picking a design and training a control policy for it.Since training these policies is time-consuming, it is computationallyinfeasible to train separate policies for all possible designs as a means toidentify the best one. In this work, we address this limitation by introducinga method that performs simultaneous joint optimization of the physical designand control network. Our approach maintains a distribution over designs anduses reinforcement learning to optimize a control policy to maximize expectedreward over the design distribution. We give the controller access to designparameters to allow it to tailor its policy to each design in the distribution.Throughout training, we shift the distribution towards higher-performingdesigns, eventually converging to a design and control policy that are jointlyoptimal. We evaluate our approach in the context of legged locomotion, anddemonstrate that it discovers novel designs and walking gaits, outperformingbaselines in both performance and efficiency.

Quick Read (beta)

loading the full paper ...