Abstract
This paper introduces Virtual Try-Off (VTOFF), a novel task focused ongenerating standardized garment images from single photos of clothedindividuals. Unlike traditional Virtual Try-On (VTON), which digitally dressesmodels, VTOFF aims to extract a canonical garment image, posing uniquechallenges in capturing garment shape, texture, and intricate patterns. Thiswell-defined target makes VTOFF particularly effective for evaluatingreconstruction fidelity in generative models. We present TryOffDiff, a modelthat adapts Stable Diffusion with SigLIP-based visual conditioning to ensurehigh fidelity and detail retention. Experiments on a modified VITON-HD datasetshow that our approach outperforms baseline methods based on pose transfer andvirtual try-on with fewer pre- and post-processing steps. Our analysis revealsthat traditional image generation metrics inadequately assess reconstructionquality, prompting us to rely on DISTS for more accurate evaluation. Ourresults highlight the potential of VTOFF to enhance product imagery ine-commerce applications, advance generative model evaluation, and inspirefuture work on high-fidelity reconstruction. Demo, code, and models areavailable at: https://rizavelioglu.github.io/tryoffdiff/