TD-MPC2: Scalable, Robust World Models for Continuous Control

  • 2024-03-21 18:56:19
  • Nicklas Hansen, Hao Su, Xiaolong Wang
  • 0

Abstract

TD-MPC is a model-based reinforcement learning (RL) algorithm that performslocal trajectory optimization in the latent space of a learned implicit(decoder-free) world model. In this work, we present TD-MPC2: a series ofimprovements upon the TD-MPC algorithm. We demonstrate that TD-MPC2 improvessignificantly over baselines across 104 online RL tasks spanning 4 diverse taskdomains, achieving consistently strong results with a single set ofhyperparameters. We further show that agent capabilities increase with modeland data size, and successfully train a single 317M parameter agent to perform80 tasks across multiple task domains, embodiments, and action spaces. Weconclude with an account of lessons, opportunities, and risks associated withlarge TD-MPC2 agents. Explore videos, models, data, code, and more athttps://tdmpc2.com

 

Quick Read (beta)

loading the full paper ...