Deep learning-based approaches for generating novel drug molecules withspecific properties have gained a lot of interest in the last few years. Recentstudies have demonstrated promising performance for string-based generation ofnovel molecules utilizing reinforcement learning. In this paper, we develop aunified framework for using reinforcement learning for de novo drug design,wherein we systematically study various on- and off-policy reinforcementlearning algorithms and replay buffers to learn an RNN-based policy to generatenovel molecules predicted to be active against the dopamine receptor DRD2. Ourfindings suggest that it is advantageous to use at least both top-scoring andlow-scoring molecules for updating the policy when structural diversity isessential. Using all generated molecules at an iteration seems to enhanceperformance stability for on-policy algorithms. In addition, when replayinghigh, intermediate, and low-scoring molecules, off-policy algorithms displaythe potential of improving the structural diversity and number of activemolecules generated, but possibly at the cost of a longer exploration phase.Our work provides an open-source framework enabling researchers to investigatevarious reinforcement learning methods for de novo drug design.