PoliFormer: Scaling On-Policy RL with Transformers Results in Masterful Navigators

  • 2024-06-28 18:51:10
  • Kuo-Hao Zeng, Zichen Zhang, Kiana Ehsani, Rose Hendrix, Jordi Salvador, Alvaro Herrasti, Ross Girshick, Aniruddha Kembhavi, Luca Weihs
  • 0

Abstract

We present PoliFormer (Policy Transformer), an RGB-only indoor navigationagent trained end-to-end with reinforcement learning at scale that generalizesto the real-world without adaptation despite being trained purely insimulation. PoliFormer uses a foundational vision transformer encoder with acausal transformer decoder enabling long-term memory and reasoning. It istrained for hundreds of millions of interactions across diverse environments,leveraging parallelized, multi-machine rollouts for efficient training withhigh throughput. PoliFormer is a masterful navigator, producingstate-of-the-art results across two distinct embodiments, the LoCoBot andStretch RE-1 robots, and four navigation benchmarks. It breaks through theplateaus of previous work, achieving an unprecedented 85.5% success rate inobject goal navigation on the CHORES-S benchmark, a 28.5% absolute improvement.PoliFormer can also be trivially extended to a variety of downstreamapplications such as object tracking, multi-object navigation, andopen-vocabulary navigation with no finetuning.

 

Quick Read (beta)

loading the full paper ...