In Defense of the Unitary Scalarization for Deep Multi-Task Learning

Abstract

Recent multi-task learning research argues against unitary scalarization,where training simply minimizes the sum of the task losses. Several ad-hocmulti-task optimization algorithms have instead been proposed, inspired byvarious hypotheses about what makes multi-task settings difficult. The majorityof these optimizers require per-task gradients, and introduce significantmemory, runtime, and implementation overhead. We present a theoretical analysissuggesting that many specialized multi-task optimizers can be interpreted asforms of regularization. Moreover, we show that, when coupled with standardregularization and stabilization techniques from single-task learning, unitaryscalarization matches or improves upon the performance of complex multi-taskoptimizers in both supervised and reinforcement learning settings. We believeour results call for a critical reevaluation of recent research in the area.

Quick Read (beta)

loading the full paper ...