Abstract
We present TinyLlama, a compact 1.1B language model pretrained on around 1trillion tokens for approximately 3 epochs. Building on the architecture andtokenizer of Llama 2, TinyLlama leverages various advances contributed by theopen-source community (e.g., FlashAttention), achieving better computationalefficiency. Despite its relatively small size, TinyLlama demonstratesremarkable performance in a series of downstream tasks. It significantlyoutperforms existing open-source language models with comparable sizes. Ourmodel checkpoints and code are publicly available on GitHub athttps://github.com/jzhang38/TinyLlama.
Quick Read (beta)
loading the full paper ...