MENTOR: Mixture-of-Experts Network with Task-Oriented Perturbation for Visual Reinforcement Learning

Abstract

Visual deep reinforcement learning (RL) enables robots to acquire skills fromvisual input for unstructured tasks. However, current algorithms suffer fromlow sample efficiency, limiting their practical applicability. In this work, wepresent MENTOR, a method that improves both the architecture and optimizationof RL agents. Specifically, MENTOR replaces the standard multi-layer perceptron(MLP) with a mixture-of-experts (MoE) backbone, enhancing the agent's abilityto handle complex tasks by leveraging modular expert learning to avoid gradientconflicts. Furthermore, MENTOR introduces a task-oriented perturbationmechanism, which heuristically samples perturbation candidates containingtask-relevant information, leading to more targeted and effective optimization.MENTOR outperforms state-of-the-art methods across three simulation domains --DeepMind Control Suite, Meta-World, and Adroit. Additionally, MENTOR achievesan average of 83% success rate on three challenging real-world roboticmanipulation tasks including peg insertion, cable routing, and tabletop golf,which significantly surpasses the success rate of 32% from the currentstrongest model-free visual RL algorithm. These results underscore theimportance of sample efficiency in advancing visual RL for real-world robotics.Experimental videos are available athttps://suninghuang19.github.io/mentor_page.

Quick Read (beta)

loading the full paper ...