Distilling Morphology-Conditioned Hypernetworks for Efficient Universal Morphology Control

Abstract

Learning a universal policy across different robot morphologies cansignificantly improve learning efficiency and enable zero-shot generalizationto unseen morphologies. However, learning a highly performant universal policyrequires sophisticated architectures like transformers (TF) that have largermemory and computational cost than simpler multi-layer perceptrons (MLP). Toachieve both good performance like TF and high efficiency like MLP at inferencetime, we propose HyperDistill, which consists of: (1) A morphology-conditionedhypernetwork (HN) that generates robot-wise MLP policies, and (2) A policydistillation approach that is essential for successful training. We show thaton UNIMAL, a benchmark with hundreds of diverse morphologies, HyperDistillperforms as well as a universal TF teacher policy on both training and unseentest robots, but reduces model size by 6-14 times, and computational cost by67-160 times in different environments. Our analysis attributes the efficiencyadvantage of HyperDistill at inference time to knowledge decoupling, i.e., theability to decouple inter-task and intra-task knowledge, a general principlethat could also be applied to improve inference efficiency in other domains.

Quick Read (beta)

loading the full paper ...