Abstract
Leveraging new data sources is a key step in accelerating the pace ofmaterials design and discovery. To complement the strides in synthesis planningdriven by historical, experimental, and computed data, we present an automatedmethod for connecting scientific literature to synthesis insights. Startingfrom natural language text, we apply word embeddings from language models,which are fed into a named entity recognition model, upon which a conditionalvariational autoencoder is trained to generate syntheses for arbitrarymaterials. We show the potential of this technique by predicting precursors fortwo perovskite materials, using only training data published over a decadeprior to their first reported syntheses. We demonstrate that the model learnsrepresentations of materials corresponding to synthesis-related properties, andthat the model's behavior complements existing thermodynamic knowledge.Finally, we apply the model to perform synthesizability screening for proposednovel perovskite compounds.