Abstract
We introduce Ego4D, a massive-scale egocentric video dataset and benchmarksuite. It offers 3,025 hours of daily-life activity video spanning hundreds ofscenarios (household, outdoor, workplace, leisure, etc.) captured by 855 uniquecamera wearers from 74 worldwide locations and 9 different countries. Theapproach to collection is designed to uphold rigorous privacy and ethicsstandards with consenting participants and robust de-identification procedureswhere relevant. Ego4D dramatically expands the volume of diverse egocentricvideo footage publicly available to the research community. Portions of thevideo are accompanied by audio, 3D meshes of the environment, eye gaze, stereo,and/or synchronized videos from multiple egocentric cameras at the same event.Furthermore, we present a host of new benchmark challenges centered aroundunderstanding the first-person visual experience in the past (querying anepisodic memory), present (analyzing hand-object manipulation, audio-visualconversation, and social interactions), and future (forecasting activities). Bypublicly sharing this massive annotated dataset and benchmark suite, we aim topush the frontier of first-person perception. Project page:https://ego4d-data.org/