LoRAtorio: An intrinsic approach to LoRA Skill Composition

Abstract

Low-Rank Adaptation (LoRA) has become a widely adopted technique intext-to-image diffusion models, enabling the personalisation of visual conceptssuch as characters, styles, and objects. However, existing approaches struggleto effectively compose multiple LoRA adapters, particularly in open-endedsettings where the number and nature of required skills are not known inadvance. In this work, we present LoRAtorio, a novel train-free framework formulti-LoRA composition that leverages intrinsic model behaviour. Our method ismotivated by two key observations: (1) LoRA adapters trained on narrow domainsproduce denoised outputs that diverge from the base model, and (2) whenoperating out-of-distribution, LoRA outputs show behaviour closer to the basemodel than when conditioned in distribution. The balance between these twoobservations allows for exceptional performance in the single LoRA scenario,which nevertheless deteriorates when multiple LoRAs are loaded. Our methodoperates in the latent space by dividing it into spatial patches and computingcosine similarity between each patch's predicted noise and that of the basemodel. These similarities are used to construct a spatially-aware weightmatrix, which guides a weighted aggregation of LoRA outputs. To address domaindrift, we further propose a modification to classifier-free guidance thatincorporates the base model's unconditional score into the composition. Weextend this formulation to a dynamic module selection setting, enablinginference-time selection of relevant LoRA adapters from a large pool. LoRAtorioachieves state-of-the-art performance, showing up to a 1.3% improvement inClipScore and a 72.43% win rate in GPT-4V pairwise evaluations, and generaliseseffectively to multiple latent diffusion models.

Quick Read (beta)

loading the full paper ...