Abstract
Self-supervised contrastive learning is one of the domains which hasprogressed rapidly over the last few years. Most of the state-of-the-artself-supervised algorithms use a large number of negative samples, momentumupdates, specific architectural modifications, or extensive training to learngood representations. Such arrangements make the overall training processcomplex and challenging to realize analytically. In this paper, we propose amutual information optimization based loss function for contrastive learningwhere we model contrastive learning into a binary classification problem topredict if a pair is positive or not. This formulation not only helps us totrack the problem mathematically but also helps us to outperform existingalgorithms. Unlike the existing methods that only maximize the mutualinformation in a positive pair, the proposed loss function optimizes the mutualinformation in both positive and negative pairs. We also present a mathematicalexpression for the parameter gradients flowing into the projector and thedisplacement of the feature vectors in the feature space. This helps us to geta mathematical insight into the working principle of contrastive learning. Anadditive $L_2$ regularizer is also used to prevent diverging of the featurevectors and to improve performance. The proposed method outperforms thestate-of-the-art algorithms on benchmark datasets like STL-10, CIFAR-10,CIFAR-100. After only 250 epochs of pre-training, the proposed model achievesthe best accuracy of 85.44\%, 60.75\%, 56.81\% on CIFAR-10, STL-10, CIFAR-100datasets, respectively.