Model-based Reinforcement Learning for Decentralized Multiagent Rendezvous

Abstract

Collaboration requires agents to align their goals on the fly. Underlying thehuman ability to align goals with other agents is their ability to predict theintentions of others and actively update their own plans. We proposehierarchical predictive planning (HPP), a model-based reinforcement learningmethod for decentralized multiagent rendezvous. Starting with pretrained,single-agent point to point navigation policies and using noisy,high-dimensional sensor inputs like lidar, we first learn via self-supervisionmotion predictions of all agents on the team. Next, HPP uses the predictionmodels to propose and evaluate navigation subgoals for completing therendezvous task without explicit communication among agents. We evaluate HPP ina suite of unseen environments, with increasing complexity and numbers ofobstacles. We show that HPP outperforms alternative reinforcement learning,path planning, and heuristic-based baselines on challenging, unseenenvironments. Experiments in the real world demonstrate successful transfer ofthe prediction models from sim to real world without any additionalfine-tuning. Altogether, HPP removes the need for a centralized operator inmultiagent systems by combining model-based RL and inference methods, enablingagents to dynamically align plans.

Quick Read (beta)

loading the full paper ...