MVImgNet: A Large-scale Dataset of Multi-view Images

Abstract

Being data-driven is one of the most iconic properties of deep learningalgorithms. The birth of ImageNet drives a remarkable trend of "learning fromlarge-scale data" in computer vision. Pretraining on ImageNet to obtain richuniversal representations has been manifested to benefit various 2D visualtasks, and becomes a standard in 2D vision. However, due to the laboriouscollection of real-world 3D data, there is yet no generic dataset serving as acounterpart of ImageNet in 3D vision, thus how such a dataset can impact the 3Dcommunity is unraveled. To remedy this defect, we introduce MVImgNet, alarge-scale dataset of multi-view images, which is highly convenient to gain byshooting videos of real-world objects in human daily life. It contains 6.5million frames from 219,188 videos crossing objects from 238 classes, with richannotations of object masks, camera parameters, and point clouds. Themulti-view attribute endows our dataset with 3D-aware signals, making it a softbridge between 2D and 3D vision. We conduct pilot studies for probing the potential of MVImgNet on a varietyof 3D and 2D visual tasks, including radiance field reconstruction, multi-viewstereo, and view-consistent image understanding, where MVImgNet demonstratespromising performance, remaining lots of possibilities for future explorations. Besides, via dense reconstruction on MVImgNet, a 3D object point clouddataset is derived, called MVPNet, covering 87,200 samples from 150 categories,with the class label on each point cloud. Experiments show that MVPNet canbenefit the real-world 3D object classification while posing new challenges topoint cloud understanding. MVImgNet and MVPNet will be publicly available, hoping to inspire the broadervision community.

Quick Read (beta)

loading the full paper ...