Abstract
We present a framework, which we call Molecule Deep $Q$-Networks (MolDQN),for molecule optimization by combining domain knowledge of chemistry andstate-of-the-art reinforcement learning techniques (double $Q$-learning andrandomized value functions). We directly define modifications on molecules,thereby ensuring 100\% chemical validity. Further, we operate withoutpre-training on any dataset to avoid possible bias from the choice of that set.Inspired by problems faced during medicinal chemistry lead optimization, weextend our model with multi-objective reinforcement learning, which maximizesdrug-likeness while maintaining similarity to the original molecule. We furthershow the path through chemical space to achieve optimization for a molecule tounderstand how the model works.