Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives

  • 2024-09-25 22:55:38
  • Kristen Grauman, Andrew Westbury, Lorenzo Torresani, Kris Kitani, Jitendra Malik, Triantafyllos Afouras, Kumar Ashutosh, Vijay Baiyya, Siddhant Bansal, Bikram Boote, Eugene Byrne, Zach Chavis, Joya Chen, Feng Cheng, Fu-Jen Chu, Sean Crane, Avijit Dasgupta, Jing Dong, Maria Escobar, Cristhian Forigua, Abrham Gebreselasie, Sanjay Haresh, Jing Huang, Md Mohaiminul Islam, Suyog Jain, Rawal Khirodkar, Devansh Kukreja, Kevin J Liang, Jia-Wei Liu, Sagnik Majumder, Yongsen Mao, Miguel Martin, Effrosyni Mavroudi, Tushar Nagarajan, Francesco Ragusa, Santhosh Kumar Ramakrishnan, Luigi Seminara, Arjun Somayazulu, Yale Song, Shan Su, Zihui Xue, Edward Zhang, Jinxu Zhang, Angela Castillo, Changan Chen, Xinzhu Fu, Ryosuke Furuta, Cristina Gonzalez, Prince Gupta, Jiabo Hu, Yifei Huang, Yiming Huang, Wesli
  • 0

Abstract

We present Ego-Exo4D, a diverse, large-scale multimodal multiview videodataset and benchmark challenge. Ego-Exo4D centers aroundsimultaneously-captured egocentric and exocentric video of skilled humanactivities (e.g., sports, music, dance, bike repair). 740 participants from 13cities worldwide performed these activities in 123 different natural scenecontexts, yielding long-form captures from 1 to 42 minutes each and 1,286 hoursof video combined. The multimodal nature of the dataset is unprecedented: thevideo is accompanied by multichannel audio, eye gaze, 3D point clouds, cameraposes, IMU, and multiple paired language descriptions -- including a novel"expert commentary" done by coaches and teachers and tailored to theskilled-activity domain. To push the frontier of first-person videounderstanding of skilled human activity, we also present a suite of benchmarktasks and their annotations, including fine-grained activity understanding,proficiency estimation, cross-view translation, and 3D hand/body pose. Allresources are open sourced to fuel new research in the community. Project page:http://ego-exo4d-data.org/

 

Quick Read (beta)

loading the full paper ...