mlf-core: a framework for deterministic machine learning

  • 2021-04-15 17:58:03
  • Lukas Heumos, Philipp Ehmele, Kevin Menden, Luis Kuhn Cuellar, Edmund Miller, Steffen Lemke, Gisela Gabernet, Sven Nahnsen
  • 28

Abstract

Machine learning has shown extensive growth in recent years. However,previously existing studies highlighted a reproducibility crisis in machinelearning. The reasons for irreproducibility are manifold. Major machinelearning libraries default to the usage of non-deterministic algorithms basedon atomic operations. Solely fixing all random seeds is not sufficient fordeterministic machine learning. To overcome this shortcoming, various machinelearning libraries released deterministic counterparts to the non-deterministicalgorithms. We evaluated the effect of these algorithms on determinism andruntime. Based on these results, we formulated a set of requirements forreproducible machine learning and developed a new software solution, themlf-core ecosystem, which aids machine learning projects to meet and keep theserequirements. We applied mlf-core to develop fully reproducible models invarious biomedical fields including a single cell autoencoder with TensorFlow,a PyTorch-based U-Net model for liver-tumor segmentation in CT scans, and aliver cancer classifier based on gene expression profiles with XGBoost.

 

Quick Read (beta)

loading the full paper ...