In multi-task learning, multiple tasks are solved jointly, sharing inductivebias between them. Multi-task learning is inherently a multi-objective problembecause different tasks may conflict, necessitating a trade-off. A commoncompromise is to optimize a proxy objective that minimizes a weighted linearcombination of per-task losses. However, this workaround is only valid when thetasks do not compete, which is rarely the case. In this paper, we explicitlycast multi-task learning as multi-objective optimization, with the overallobjective of finding a Pareto optimal solution. To this end, we use algorithmsdeveloped in the gradient-based multi-objective optimization literature. Thesealgorithms are not directly applicable to large-scale learning problems sincethey scale poorly with the dimensionality of the gradients and the number oftasks. We therefore propose an upper bound for the multi-objective loss andshow that it can be optimized efficiently. We further prove that optimizingthis upper bound yields a Pareto optimal solution under realistic assumptions.We apply our method to a variety of multi-task deep learning problems includingdigit classification, scene understanding (joint semantic segmentation,instance segmentation, and depth estimation), and multi-label classification.Our method produces higher-performing models than recent multi-task learningformulations or per-task training.