Abstract
Current multi-subject customization approaches encounter two criticalchallenges: the difficulty in acquiring diverse multi-subject training data,and attribute entanglement across different subjects. To bridge these gaps, wepropose MUSAR - a simple yet effective framework to achieve robustmulti-subject customization while requiring only single-subject training data.Firstly, to break the data limitation, we introduce debiased diptych learning.It constructs diptych training pairs from single-subject images to facilitatemulti-subject learning, while actively correcting the distribution biasintroduced by diptych construction via static attention routing and dual-branchLoRA. Secondly, to eliminate cross-subject entanglement, we introduce dynamicattention routing mechanism, which adaptively establishes bijective mappingsbetween generated images and conditional subjects. This design not onlyachieves decoupling of multi-subject representations but also maintainsscalable generalization performance with increasing reference subjects.Comprehensive experiments demonstrate that our MUSAR outperforms existingmethods - even those trained on multi-subject dataset - in image quality,subject consistency, and interaction naturalness, despite requiring onlysingle-subject dataset.