InstructG2I: Synthesizing Images from Multimodal Attributed Graphs

Abstract

In this paper, we approach an overlooked yet critical task Graph2Image:generating images from multimodal attributed graphs (MMAGs). This task posessignificant challenges due to the explosion in graph size, dependencies amonggraph entities, and the need for controllability in graph conditions. Toaddress these challenges, we propose a graph context-conditioned diffusionmodel called InstructG2I. InstructG2I first exploits the graph structure andmultimodal information to conduct informative neighbor sampling by combiningpersonalized page rank and re-ranking based on vision-language features. Then,a Graph-QFormer encoder adaptively encodes the graph nodes into an auxiliaryset of graph prompts to guide the denoising process of diffusion. Finally, wepropose graph classifier-free guidance, enabling controllable generation byvarying the strength of graph guidance and multiple connected edges to a node.Extensive experiments conducted on three datasets from different domainsdemonstrate the effectiveness and controllability of our approach. The code isavailable at https://github.com/PeterGriffinJin/InstructG2I.

Quick Read (beta)

loading the full paper ...