Abstract
Current auto-regressive models can generate high-quality, topologicallyprecise meshes; however, they necessitate thousands-or even tens ofthousands-of next-token predictions during inference, resulting in substantiallatency. We introduce XSpecMesh, a quality-preserving acceleration method forauto-regressive mesh generation models. XSpecMesh employs a lightweight,multi-head speculative decoding scheme to predict multiple tokens in parallelwithin a single forward pass, thereby accelerating inference. We furtherpropose a verification and resampling strategy: the backbone model verifieseach predicted token and resamples any tokens that do not meet the qualitycriteria. In addition, we propose a distillation strategy that trains thelightweight decoding heads by distilling from the backbone model, encouragingtheir prediction distributions to align and improving the success rate ofspeculative predictions. Extensive experiments demonstrate that our methodachieves a 1.7x speedup without sacrificing generation quality. Our code willbe released.