How to make a pizza: Learning a compositional layer-based GAN model

Abstract

A food recipe is an ordered set of instructions for preparing a particulardish. From a visual perspective, every instruction step can be seen as a way tochange the visual appearance of the dish by adding extra objects (e.g., addingan ingredient) or changing the appearance of the existing ones (e.g., cookingthe dish). In this paper, we aim to teach a machine how to make a pizza bybuilding a generative model that mirrors this step-by-step procedure. To do so,we learn composable module operations which are able to either add or remove aparticular ingredient. Each operator is designed as a Generative AdversarialNetwork (GAN). Given only weak image-level supervision, the operators aretrained to generate a visual layer that needs to be added to or removed fromthe existing image. The proposed model is able to decompose an image into anordered sequence of layers by applying sequentially in the right order thecorresponding removing modules. Experimental results on synthetic and realpizza images demonstrate that our proposed model is able to: (1) segment pizzatoppings in a weaklysupervised fashion, (2) remove them by revealing what isoccluded underneath them (i.e., inpainting), and (3) infer the ordering of thetoppings without any depth ordering supervision. Code, data, and models areavailable online.

Quick Read (beta)

loading the full paper ...