Cosmos-Transfer1: Conditional World Generation with Adaptive Multimodal Control

  • 2025-03-18 18:57:54
  • NVIDIA, :, Hassan Abu Alhaija, Jose Alvarez, Maciej Bala, Tiffany Cai, Tianshi Cao, Liz Cha, Joshua Chen, Mike Chen, Francesco Ferroni, Sanja Fidler, Dieter Fox, Yunhao Ge, Jinwei Gu, Ali Hassani, Michael Isaev, Pooya Jannaty, Shiyi Lan, Tobias Lasser, Huan Ling, Ming-Yu Liu, Xian Liu, Yifan Lu, Alice Luo, Qianli Ma, Hanzi Mao, Fabio Ramos, Xuanchi Ren, Tianchang Shen, Shitao Tang, Ting-Chun Wang, Jay Wu, Jiashu Xu, Stella Xu, Kevin Xie, Yuchong Ye, Xiaodong Yang, Xiaohui Zeng, Yu Zeng
  • 0

Abstract

We introduce Cosmos-Transfer, a conditional world generation model that cangenerate world simulations based on multiple spatial control inputs of variousmodalities such as segmentation, depth, and edge. In the design, the spatialconditional scheme is adaptive and customizable. It allows weighting differentconditional inputs differently at different spatial locations. This enableshighly controllable world generation and finds use in various world-to-worldtransfer use cases, including Sim2Real. We conduct extensive evaluations toanalyze the proposed model and demonstrate its applications for Physical AI,including robotics Sim2Real and autonomous vehicle data enrichment. We furtherdemonstrate an inference scaling strategy to achieve real-time world generationwith an NVIDIA GB200 NVL72 rack. To help accelerate research development in thefield, we open-source our models and code athttps://github.com/nvidia-cosmos/cosmos-transfer1.

 

Quick Read (beta)

loading the full paper ...