Nabla-R2D3: Effective and Efficient 3D Diffusion Alignment with 2D Rewards

Abstract

Generating high-quality and photorealistic 3D assets remains a longstandingchallenge in 3D vision and computer graphics. Although state-of-the-artgenerative models, such as diffusion models, have made significant progress in3D generation, they often fall short of human-designed content due to limitedability to follow instructions, align with human preferences, or producerealistic textures, geometries, and physical attributes. In this paper, weintroduce Nabla-R2D3, a highly effective and sample-efficient reinforcementlearning alignment framework for 3D-native diffusion models using 2D rewards.Built upon the recently proposed Nabla-GFlowNet method, which matches the scorefunction to reward gradients in a principled manner for reward finetuning, ourNabla-R2D3 enables effective adaptation of 3D diffusion models using only 2Dreward signals. Extensive experiments show that, unlike vanilla finetuningbaselines which either struggle to converge or suffer from reward hacking,Nabla-R2D3 consistently achieves higher rewards and reduced prior forgettingwithin a few finetuning steps.

Quick Read (beta)

loading the full paper ...