What Changes Can Large-scale Language Models Bring? Intensive Study on HyperCLOVA: Billions-scale Korean Generative Pretrained Transformers

Abstract

GPT-3 shows remarkable in-context learning ability of large-scale languagemodels (LMs) trained on hundreds of billion scale data. Here we address someremaining issues less reported by the GPT-3 paper, such as a non-English LM,the performances of different sized models, and the effect of recentlyintroduced prompt optimization on in-context learning. To achieve this, weintroduce HyperCLOVA, a Korean variant of 82B GPT-3 trained on a Korean-centriccorpus of 560B tokens. Enhanced by our Korean-specific tokenization, HyperCLOVAwith our training configuration shows state-of-the-art in-context zero-shot andfew-shot learning performances on various downstream tasks in Korean. Also, weshow the performance benefits of prompt-based learning and demonstrate how itcan be integrated into the prompt engineering pipeline. Then we discuss thepossibility of materializing the No Code AI paradigm by providing AIprototyping capabilities to non-experts of ML by introducing HyperCLOVA studio,an interactive prompt engineering interface. Lastly, we demonstrate thepotential of our methods with three successful in-house applications.

Quick Read (beta)

loading the full paper ...