Abstract
In the field of digital content creation, generating high-quality 3Dcharacters from single images is challenging, especially given the complexitiesof various body poses and the issues of self-occlusion and pose ambiguity. Inthis paper, we present CharacterGen, a framework developed to efficientlygenerate 3D characters. CharacterGen introduces a streamlined generationpipeline along with an image-conditioned multi-view diffusion model. This modeleffectively calibrates input poses to a canonical form while retaining keyattributes of the input image, thereby addressing the challenges posed bydiverse poses. A transformer-based, generalizable sparse-view reconstructionmodel is the other core component of our approach, facilitating the creation ofdetailed 3D models from multi-view images. We also adopt atexture-back-projection strategy to produce high-quality texture maps.Additionally, we have curated a dataset of anime characters, rendered inmultiple poses and views, to train and evaluate our model. Our approach hasbeen thoroughly evaluated through quantitative and qualitative experiments,showing its proficiency in generating 3D characters with high-quality shapesand textures, ready for downstream applications such as rigging and animation.