Abstract
Selection is the first step in many image editing processes, enabling fasterand simpler modifications of all pixels sharing a common modality. In thiswork, we present a method for material selection in images, robust to lightingand reflectance variations, which can be used for downstream editing tasks. Werely on vision transformer (ViT) models and leverage their features forselection, proposing a multi-resolution processing strategy that yields finerand more stable selection results than prior methods. Furthermore, we enableselection at two levels: texture and subtexture, leveraging a new two-levelmaterial selection (DuMaS) dataset which includes dense annotations for over800,000 synthetic images, both on the texture and subtexture levels.