In model-free deep reinforcement learning (RL) algorithms, using noisy valueestimates to supervise policy evaluation and optimization is detrimental to thesample efficiency. As this noise is heteroscedastic, its effects can bemitigated using uncertainty-based weights in the optimization process. Previousmethods rely on sampled ensembles, which do not capture all aspects ofuncertainty. We provide a systematic analysis of the sources of uncertainty inthe noisy supervision that occurs in RL, and introduce inverse-variance RL, aBayesian framework which combines probabilistic ensembles and Batch InverseVariance weighting. We propose a method whereby two complementary uncertaintyestimation methods account for both the Q-value and the environmentstochasticity to better mitigate the negative impacts of noisy supervision. Ourresults show significant improvement in terms of sample efficiency on discreteand continuous control tasks.