Peripheral Vision Transformer

  • 2022-06-14 13:47:47
  • Juhong Min, Yucheng Zhao, Chong Luo, Minsu Cho
  • 28


Human vision possesses a special type of visual processing systems calledperipheral vision. Partitioning the entire visual field into multiple contourregions based on the distance to the center of our gaze, the peripheral visionprovides us the ability to perceive various visual features at differentregions. In this work, we take a biologically inspired approach and explore tomodel peripheral vision in deep neural networks for visual recognition. Wepropose to incorporate peripheral position encoding to the multi-headself-attention layers to let the network learn to partition the visual fieldinto diverse peripheral regions given training data. We evaluate the proposednetwork, dubbed PerViT, on the large-scale ImageNet dataset and systematicallyinvestigate the inner workings of the model for machine perception, showingthat the network learns to perceive visual data similarly to the way that humanvision does. The state-of-the-art performance in image classification taskacross various model sizes demonstrates the efficacy of the proposed method.


