Abstract
Large-scale, diverse robot datasets have emerged as a promising path towardenabling dexterous manipulation policies to generalize to novel environments,but acquiring such datasets presents many challenges. While teleoperationprovides high-fidelity datasets, its high cost limits its scalability. Instead,what if people could use their own hands, just as they do in everyday life, tocollect data? In DexWild, a diverse team of data collectors uses their hands tocollect hours of interactions across a multitude of environments and objects.To record this data, we create DexWild-System, a low-cost, mobile, andeasy-to-use device. The DexWild learning framework co-trains on both human androbot demonstrations, leading to improved performance compared to training oneach dataset individually. This combination results in robust robot policiescapable of generalizing to novel environments, tasks, and embodiments withminimal additional robot-specific data. Experimental results demonstrate thatDexWild significantly improves performance, achieving a 68.5% success rate inunseen environments-nearly four times higher than policies trained with robotdata only-and offering 5.8x better cross-embodiment generalization. Videoresults, codebases, and instructions at https://dexwild.github.io