Abstract
We propose MeshLRM, a novel LRM-based approach that can reconstruct ahigh-quality mesh from merely four input images in less than one second.Different from previous large reconstruction models (LRMs) that focus onNeRF-based reconstruction, MeshLRM incorporates differentiable mesh extractionand rendering within the LRM framework. This allows for end-to-end meshreconstruction by fine-tuning a pre-trained NeRF LRM with mesh rendering.Moreover, we improve the LRM architecture by simplifying several complexdesigns in previous LRMs. MeshLRM's NeRF initialization is sequentially trainedwith low- and high-resolution images; this new LRM training strategy enablessignificantly faster convergence and thereby leads to better quality with lesscompute. Our approach achieves state-of-the-art mesh reconstruction fromsparse-view inputs and also allows for many downstream applications, includingtext-to-3D and single-image-to-3D generation. Project page:https://sarahweiii.github.io/meshlrm/