Molecular property prediction is one of the fastest-growing applications ofdeep learning with critical real-world impacts. Including 3D molecularstructure as input to learned models their performance for many moleculartasks. However, this information is infeasible to compute at the scale requiredby several real-world applications. We propose pre-training a model to reasonabout the geometry of molecules given only their 2D molecular graphs. Usingmethods from self-supervised learning, we maximize the mutual informationbetween 3D summary vectors and the representations of a Graph Neural Network(GNN) such that they contain latent 3D information. During fine-tuning onmolecules with unknown geometry, the GNN still generates implicit 3Dinformation and can use it to improve downstream tasks. We show that 3Dpre-training provides significant improvements for a wide range of properties,such as a 22% average MAE reduction on eight quantum mechanical properties.Moreover, the learned representations can be effectively transferred betweendatasets in different molecular spaces.