Abstract
Facial editing is an important task in vision and graphics with numerousapplications. However, existing works are incapable to deliver a continuous andfine-grained editing mode (e.g., editing a slightly smiling face to a biglaughing one) with natural interactions with users. In this work, we proposeTalk-to-Edit, an interactive facial editing framework that performsfine-grained attribute manipulation through dialog between the user and thesystem. Our key insight is to model a continual "semantic field" in the GANlatent space. 1) Unlike previous works that regard the editing as traversingstraight lines in the latent space, here the fine-grained editing is formulatedas finding a curving trajectory that respects fine-grained attribute landscapeon the semantic field. 2) The curvature at each step is location-specific anddetermined by the input image as well as the users' language requests. 3) Toengage the users in a meaningful dialog, our system generates language feedbackby considering both the user request and the current state of the semanticfield. We also contribute CelebA-Dialog, a visual-language facial editing dataset tofacilitate large-scale study. Specifically, each image has manually annotatedfine-grained attribute annotations as well as template-based textualdescriptions in natural language. Extensive quantitative and qualitativeexperiments demonstrate the superiority of our framework in terms of 1) thesmoothness of fine-grained editing, 2) the identity/attribute preservation, and3) the visual photorealism and dialog fluency. Notably, user study validatesthat our overall system is consistently favored by around 80% of theparticipants. Our project page is https://www.mmlab-ntu.com/project/talkedit/.