Abstract
Existing 2D methods utilize UNet-based diffusion models to generatemulti-view physically-based rendering (PBR) maps but struggle with multi-viewinconsistency, while some 3D methods directly generate UV maps, encounteringgeneralization issues due to the limited 3D data. To address these problems, wepropose a two-stage approach, including multi-view generation and UV materialsrefinement. In the generation stage, we adopt a Diffusion Transformer (DiT)model to generate PBR materials, where both the specially designed multi-branchDiT and reference-based DiT blocks adopt a global attention mechanism topromote feature interaction and fusion between different views, therebyimproving multi-view consistency. In addition, we adopt a PBR-based diffusionloss to ensure that the generated materials align with realistic physicalprinciples. In the refinement stage, we propose a material-refined DiT thatperforms inpainting in empty areas and enhances details in UV space. Except forthe normal condition, this refinement also takes the material map from thegeneration stage as an additional condition to reduce the learning difficultyand improve generalization. Extensive experiments show that our method achievesstate-of-the-art performance in texturing 3D objects with PBR materials andprovides significant advantages for graphics relighting applications. ProjectPage: https://lingtengqiu.github.io/2024/MCMat/