Transformer architectures show spectacular performance on NLP tasks and haverecently also been used for tasks such as image completion or imageclassification. Here we propose to use a sequential image representation, whereeach prefix of the complete sequence describes the whole image at reducedresolution. Using such Fourier Domain Encodings (FDEs), an auto-regressiveimage completion task is equivalent to predicting a higher resolution outputgiven a low-resolution input. Additionally, we show that an encoder-decodersetup can be used to query arbitrary Fourier coefficients given a set ofFourier domain observations. We demonstrate the practicality of this approachin the context of computed tomography (CT) image reconstruction. In summary, weshow that Fourier Image Transformer (FIT) can be used to solve relevant imageanalysis tasks in Fourier space, a domain inherently inaccessible toconvolutional architectures.