Abstract
In this paper, we present Synergy, a language model that bridges differentlevels of abstraction in an end-to-end fashion through a learned routingmechanism. Focusing on low-level linguistic abstraction, we trained our modelas a byte-level language model. Our model spontaneously learns to tokenizebytes, producing fewer concept tokens than Byte-level Byte Pair Encoder (BBPE)tokenizers while keeping comparable performance. By comparing with Llama3, weobserved an advantage of Synergy under the same model scale and trainingdataset size. Further studies show that the middle part (the higher abstractionpart) of our model performs better when positional encodings are removed,suggesting the emergence of position-independent concepts. These findingsdemonstrate the feasibility of tokenizer-free architectures, paving the way formore robust and flexible pipelines.