What Changes Can Large-scale Language Models Bring? Intensive Study on HyperCLOVA: Billions-scale Korean Generative Pretrained Transformers

  • 2021-09-10 03:32:19
  • Boseop Kim, HyoungSeok Kim, Sang-Woo Lee, Gichang Lee, Donghyun Kwak, Dong Hyeon Jeon, Sunghyun Park, Sungju Kim, Seonhoon Kim, Dongpil Seo, Heungsub Lee, Minyoung Jeong, Sungjae Lee, Minsub Kim, Suk Hyun Ko, Seokhun Kim, Taeyong Park, Jinuk Kim, Soyoung Kang, Na-Hyeon Ryu, Kang Min Yoo, Minsuk Chang, Soobin Suh, Sookyo In, Jinseong Park, Kyungduk Kim, Hiun Kim, Jisu Jeong, Yong Goo Yeo, Donghoon Ham, Dongju Park, Min Young Lee, Jaewook Kang, Inho Kang, Jung-Woo Ha, Woomyoung Park, Nako Sung
  • 276

Abstract

GPT-3 shows remarkable in-context learning ability of large-scale languagemodels (LMs) trained on hundreds of billion scale data. Here we address someremaining issues less reported by the GPT-3 paper, such as a non-English LM,the performances of different sized models, and the effect of recentlyintroduced prompt optimization on in-context learning. To achieve this, weintroduce HyperCLOVA, a Korean variant of 82B GPT-3 trained on a Korean-centriccorpus of 560B tokens. Enhanced by our Korean-specific tokenization, HyperCLOVAwith our training configuration shows state-of-the-art in-context zero-shot andfew-shot learning performances on various downstream tasks in Korean. Also, weshow the performance benefits of prompt-based learning and demonstrate how itcan be integrated into the prompt engineering pipeline. Then we discuss thepossibility of materializing the No Code AI paradigm by providing AIprototyping capabilities to non-experts of ML by introducing HyperCLOVA studio,an interactive prompt engineering interface. Lastly, we demonstrate thepotential of our methods with three successful in-house applications.

 

Quick Read (beta)

loading the full paper ...