Abstract
TD-MPC is a model-based reinforcement learning (RL) algorithm that performslocal trajectory optimization in the latent space of a learned implicit(decoder-free) world model. In this work, we present TD-MPC2: a series ofimprovements upon the TD-MPC algorithm. We demonstrate that TD-MPC2 improvessignificantly over baselines across 104 online RL tasks spanning 4 diverse taskdomains, achieving consistently strong results with a single set ofhyperparameters. We further show that agent capabilities increase with modeland data size, and successfully train a single 317M parameter agent to perform80 tasks across multiple task domains, embodiments, and action spaces. Weconclude with an account of lessons, opportunities, and risks associated withlarge TD-MPC2 agents. Explore videos, models, data, code, and more athttps://tdmpc2.com