Abstract
We introduce Cosmos-Transfer, a conditional world generation model that cangenerate world simulations based on multiple spatial control inputs of variousmodalities such as segmentation, depth, and edge. In the design, the spatialconditional scheme is adaptive and customizable. It allows weighting differentconditional inputs differently at different spatial locations. This enableshighly controllable world generation and finds use in various world-to-worldtransfer use cases, including Sim2Real. We conduct extensive evaluations toanalyze the proposed model and demonstrate its applications for Physical AI,including robotics Sim2Real and autonomous vehicle data enrichment. We furtherdemonstrate an inference scaling strategy to achieve real-time world generationwith an NVIDIA GB200 NVL72 rack. To help accelerate research development in thefield, we open-source our models and code athttps://github.com/nvidia-cosmos/cosmos-transfer1.