Abstract
Content-aware graphic layout generation aims to automatically arrange visualelements along with a given content, such as an e-commerce product image. Inthis paper, we argue that the current layout generation approaches suffer fromthe limited training data for the high-dimensional layout structure. We showthat a simple retrieval augmentation can significantly improve the generationquality. Our model, which is named Retrieval-Augmented Layout Transformer(RALF), retrieves nearest neighbor layout examples based on an input image andfeeds these results into an autoregressive generator. Our model can applyretrieval augmentation to various controllable generation tasks and yieldhigh-quality layouts within a unified architecture. Our extensive experimentsshow that RALF successfully generates content-aware layouts in both constrainedand unconstrained settings and significantly outperforms the baselines.