Abstract
A considerable part of the performance of today's large language models(LLM's) and multimodal large language models (MLLM's) depends on theirtokenization strategies. While tokenizers are extensively researched fortextual and visual input, there is no research on tokenization strategies forgaze data due to its nature. However, a corresponding tokenization strategywould allow using the vision capabilities of pre-trained MLLM's for gaze data,for example, through fine-tuning. In this paper, we aim to close this research gap by analyzing five differenttokenizers for gaze data on three different datasets for the forecasting andgeneration of gaze data through LLMs (cf.~\cref{fig:teaser}). We evaluate thetokenizers regarding their reconstruction and compression abilities. Further,we train an LLM for each tokenization strategy, measuring its generative andpredictive performance. Overall, we found that a quantile tokenizer outperformsall others in predicting the gaze positions and k-means is best when predictinggaze velocities.