Abstract
Multi-task learning for dense prediction is limited by the need for extensiveannotation for every task, though recent works have explored training withpartial task labels. Leveraging the generalization power of diffusion models,we extend the partial learning setup to a zero-shot setting, training amulti-task model on multiple synthetic datasets, each labeled for only a subsetof tasks. Our method, StableMTL, repurposes image generators for latentregression. Adapting a denoising framework with task encoding, per-taskconditioning and a tailored training scheme. Instead of per-task lossesrequiring careful balancing, a unified latent loss is adopted, enablingseamless scaling to more tasks. To encourage inter-task synergy, we introduce amulti-stream model with a task-attention mechanism that converts N-to-N taskinteractions into efficient 1-to-N attention, promoting effective cross-tasksharing. StableMTL outperforms baselines on 7 tasks across 8 benchmarks.