Abstract
Human-in-the-loop reinforcement learning integrates human expertise toaccelerate agent learning and provide critical guidance and feedback in complexfields. However, many existing approaches focus on single-agent tasks andrequire continuous human involvement during the training process, significantlyincreasing the human workload and limiting scalability. In this paper, wepropose HARP (Human-Assisted Regrouping with Permutation Invariant Critic), amulti-agent reinforcement learning framework designed for group-oriented tasks.HARP integrates automatic agent regrouping with strategic human assistanceduring deployment, enabling and allowing non-experts to offer effectiveguidance with minimal intervention. During training, agents dynamically adjusttheir groupings to optimize collaborative task completion. When deployed, theyactively seek human assistance and utilize the Permutation Invariant GroupCritic to evaluate and refine human-proposed groupings, allowing non-expertusers to contribute valuable suggestions. In multiple collaboration scenarios,our approach is able to leverage limited guidance from non-experts and enhanceperformance. The project can be found at https://github.com/huawen-hu/HARP.