Abstract
Although diffusion models can generate high-quality human images, theirapplications are limited by the instability in generating hands with correctstructures. In this paper, we introduce RHanDS, a conditional diffusion-basedframework designed to refine malformed hands by utilizing decoupled structureand style guidance. The hand mesh reconstructed from the malformed hand offersstructure guidance for correcting the structure of the hand, while themalformed hand itself provides style guidance for preserving the style of thehand. To alleviate the mutual interference between style and structureguidance, we introduce a two-stage training strategy and build a series ofmulti-style hand datasets. In the first stage, we use paired hand images fortraining to ensure stylistic consistency in hand refining. In the second stage,various hand images generated based on human meshes are used for training,enabling the model to gain control over the hand structure. Experimentalresults demonstrate that RHanDS can effectively refine hand structure whilepreserving consistency in hand style.