Abstract
We present Hierarchical Motion Representation (HiMoR), a novel deformationrepresentation for 3D Gaussian primitives capable of achieving high-qualitymonocular dynamic 3D reconstruction. The insight behind HiMoR is that motionsin everyday scenes can be decomposed into coarser motions that serve as thefoundation for finer details. Using a tree structure, HiMoR's nodes representdifferent levels of motion detail, with shallower nodes modeling coarse motionfor temporal smoothness and deeper nodes capturing finer motion. Additionally,our model uses a few shared motion bases to represent motions of different setsof nodes, aligning with the assumption that motion tends to be smooth andsimple. This motion representation design provides Gaussians with a morestructured deformation, maximizing the use of temporal relationships to tacklethe challenging task of monocular dynamic 3D reconstruction. We also proposeusing a more reliable perceptual metric as an alternative, given thatpixel-level metrics for evaluating monocular dynamic 3D reconstruction cansometimes fail to accurately reflect the true quality of reconstruction.Extensive experiments demonstrate our method's efficacy in achieving superiornovel view synthesis from challenging monocular videos with complex motions.