Language models are better than humans at next-token prediction

Abstract

Current language models are considered to have sub-human capabilities atnatural language tasks like question-answering or writing code. However,language models are not trained to perform well at these tasks, they aretrained to accurately predict the next token given previous tokes in tokenizedtext. It is not clear whether language models are better or worse than humansat next token prediction. To try to answer this question, we performed twodistinct experiments to directly compare humans and language models on thisfront: one measuring top-1 accuracy and the other measuring perplexity. In bothexperiments, we find humans to be consistently \emph{worse} than evenrelatively small language models like GPT3-Ada at next-token prediction.

Quick Read (beta)

loading the full paper ...