Chemical space is routinely explored by machine learning methods to discoverinteresting molecules, before time-consuming experimental synthesizing isattempted. However, these methods often rely on a graph representation,ignoring 3D information necessary for determining the stability of themolecules. We propose a reinforcement learning approach for generatingmolecules in cartesian coordinates allowing for quantum chemical prediction ofthe stability. To improve sample-efficiency we learn basic chemical rules fromimitation learning on the GDB-11 database to create an initial model applicablefor all stoichiometries. We then deploy multiple copies of the modelconditioned on a specific stoichiometry in a reinforcement learning setting.The models correctly identify low energy molecules in the database and producenovel isomers not found in the training set. Finally, we apply the model tolarger molecules to show how reinforcement learning further refines theimitation learning model in domains far from the training data.