Abstract
Powder X-ray diffraction (pXRD) experiments are a cornerstone for materialsstructure characterization. Despite their widespread application, analyzingpXRD diffractograms still presents a significant challenge to automation and abottleneck in high-throughput discovery in self-driving labs. Machine learningpromises to resolve this bottleneck by enabling automated powder diffractionanalysis. A notable difficulty in applying machine learning to this domain isthe lack of sufficiently sized experimental datasets, which has constrainedresearchers to train primarily on simulated data. However, models trained onsimulated pXRD patterns showed limited generalization to experimental patterns,particularly for low-quality experimental patterns with high noise levels andelevated backgrounds. With the Open Experimental Powder X-Ray DiffractionDatabase (opXRD), we provide an openly available and easily accessible datasetof labeled and unlabeled experimental powder diffractograms. Labeled opXRD datacan be used to evaluate the performance of models on experimental data andunlabeled opXRD data can help improve the performance of models on experimentaldata, e.g. through transfer learning methods. We collected 92552diffractograms, 2179 of them labeled, from a wide spectrum of materialsclasses. We hope this ongoing effort can guide machine learning research towardfully automated analysis of pXRD data and thus enable future self-drivingmaterials labs.