We consider the problem of detecting out-of-distribution (OOD) samples indeep reinforcement learning. In a value based reinforcement learning setting,we propose to use uncertainty estimation techniques directly on the agent'svalue estimating neural network to detect OOD samples. The focus of our worklies in analyzing the suitability of approximate Bayesian inference methods andrelated ensembling techniques that generate uncertainty estimates. Althoughprior work has shown that dropout-based variational inference techniques andbootstrap-based approaches can be used to model epistemic uncertainty, thesuitability for detecting OOD samples in deep reinforcement learning remains anopen question. Our results show that uncertainty estimation can be used todifferentiate in- from out-of-distribution samples. Over the complete trainingprocess of the reinforcement learning agents, bootstrap-based approaches tendto produce more reliable epistemic uncertainty estimates, when compared todropout-based approaches.