OPT: Open Pre-trained Transformer Language Models

Abstract

Large language models, which are often trained for hundreds of thousands ofcompute days, have shown remarkable capabilities for zero- and few-shotlearning. Given their computational cost, these models are difficult toreplicate without significant capital. For the few that are available throughAPIs, no access is granted to the full model weights, making them difficult tostudy. We present Open Pre-trained Transformers (OPT), a suite of decoder-onlypre-trained transformers ranging from 125M to 175B parameters, which we aim tofully and responsibly share with interested researchers. We show that OPT-175Bis comparable to GPT-3, while requiring only 1/7th the carbon footprint todevelop. We are also releasing our logbook detailing the infrastructurechallenges we faced, along with code for experimenting with all of the releasedmodels.

Quick Read (beta)

loading the full paper ...