Abstract
Advancing towards generalist agents necessitates the concurrent processing ofmultiple tasks using a unified model, thereby underscoring the growingsignificance of simultaneous model training on multiple downstream tasks. Acommon issue in multi-task learning is the occurrence of gradient conflict,which leads to potential competition among different tasks during jointtraining. This competition often results in improvements in one task at theexpense of deterioration in another. Although several optimization methods havebeen developed to address this issue by manipulating task gradients for bettertask balancing, they cannot decrease the incidence of gradient conflict. Inthis paper, we systematically investigate the occurrence of gradient conflictacross different methods and propose a strategy to reduce such conflictsthrough sparse training (ST), wherein only a portion of the model's parametersare updated during training while keeping the rest unchanged. Our extensiveexperiments demonstrate that ST effectively mitigates conflicting gradients andleads to superior performance. Furthermore, ST can be easily integrated withgradient manipulation techniques, thus enhancing their effectiveness.