Abstract
Since the release of T\"ULU [Wang et al., 2023b], open resources forinstruction tuning have developed quickly, from better base models to newfinetuning techniques. We test and incorporate a number of these advances intoT\"ULU, resulting in T\"ULU 2, a suite of improved T\"ULU models for advancingthe understanding and best practices of adapting pretrained language models todownstream tasks and user preferences. Concretely, we release: (1)T\"ULU-V2-mix, an improved collection of high-quality instruction datasets; (2)T\"ULU 2, LLAMA-2 models finetuned on the V2 mixture; (3) T\"ULU 2+DPO, T\"ULU2 models trained with direct preference optimization (DPO), including thelargest DPO-trained model to date (T\"ULU 2+DPO 70B); (4) CODE T\"ULU 2, CODELLAMA models finetuned on our V2 mix that outperform CODE LLAMA and itsinstruction-tuned variant, CODE LLAMA-Instruct. Our evaluation from multipleperspectives shows that the T\"ULU 2 suite achieves state-of-the-artperformance among open models and matches or exceeds the performance ofGPT-3.5-turbo-0301 on several benchmarks. We release all the checkpoints, data,training and evaluation code to facilitate future open efforts on adaptinglarge language models.