Abstract
Data representation in non-Euclidean spaces has proven effective forcapturing hierarchical and complex relationships in real-world datasets.Hyperbolic spaces, in particular, provide efficient embeddings for hierarchicalstructures. This paper introduces the Hyperbolic Vision Transformer (HVT), anovel extension of the Vision Transformer (ViT) that integrates hyperbolicgeometry. While traditional ViTs operate in Euclidean space, our methodenhances the self-attention mechanism by leveraging hyperbolic distance andM\"obius transformations. This enables more effective modeling of hierarchicaland relational dependencies in image data. We present rigorous mathematicalformulations, showing how hyperbolic geometry can be incorporated intoattention layers, feed-forward networks, and optimization. We offer improvedperformance for image classification using the ImageNet dataset.