Abstract
We study a referential game (a type of signaling game) where two agentscommunicate with each other via a discrete bottleneck to achieve a common goal.In our referential game, the goal of the speaker is to compose a message or asymbolic representation of "important" image patches, while the task for thelistener is to match the speaker's message to a different view of the sameimage. We show that it is indeed possible for the two agents to develop acommunication protocol without explicit or implicit supervision. We furtherinvestigate the developed protocol and show the applications in speeding uprecent Vision Transformers by using only important patches, and as pre-trainingfor downstream recognition tasks (e.g., classification). Code available athttps://github.com/kampta/PatchGame.