How Powerful are Decoder-Only Transformer Neural Models?

Abstract

In this article we prove that the general transformer neural modelundergirding modern large language models (LLMs) is Turing complete underreasonable assumptions. This is the first work to directly address the Turingcompleteness of the underlying technology employed in GPT-x as past work hasfocused on the more expressive, full auto-encoder transformer architecture.From this theoretical analysis, we show that the sparsity/compressibility ofthe word embedding is an important consideration for Turing completeness tohold. We also show that Transformers are are a variant of B machines studied byHao Wang.

Quick Read (beta)

loading the full paper ...