Vision-based reinforcement learning (RL) is a promising approach to solvecontrol tasks involving images as the main observation. State-of-the-art RLalgorithms still struggle in terms of sample efficiency, especially when usingimage observations. This has led to increased attention on integrating staterepresentation learning (SRL) techniques into the RL pipeline. Work in thisfield demonstrates a substantial improvement in sample efficiency among otherbenefits. However, to take full advantage of this paradigm, the quality ofsamples used for training plays a crucial role. More importantly, the diversityof these samples could affect the sample efficiency of vision-based RL, butalso its generalization capability. In this work, we present an approach toimprove sample diversity for state representation learning. Our method enhancesthe exploration capability of RL algorithms, by taking advantage of the SRLsetup. Our experiments show that our proposed approach boosts the visitation ofproblematic states, improves the learned state representation, and outperformsthe baselines for all tested environments. These results are most apparent forenvironments where the baseline methods struggle. Even in simple environments,our method stabilizes the training, reduces the reward variance, and promotessample efficiency.