The carbon cost of materials discovery: Can machine learning really accelerate the discovery of new photovoltaics?

  • 2025-07-17 15:55:02
  • Matthew Walker, Keith T. Butler
  • 0

Abstract

Computational screening has become a powerful complement to experimentalefforts in the discovery of high-performance photovoltaic (PV) materials. Mostworkflows rely on density functional theory (DFT) to estimate electronic andoptical properties relevant to solar energy conversion. Although more efficientthan laboratory-based methods, DFT calculations still entail substantialcomputational and environmental costs. Machine learning (ML) models haverecently gained attention as surrogates for DFT, offering drastic reductions inresource use with competitive predictive performance. In this study, wereproduce a canonical DFT-based workflow to estimate the maximum efficiencylimit and progressively replace its components with ML surrogates. Byquantifying the CO$_2$ emissions associated with each computational strategy,we evaluate the trade-offs between predictive efficacy and environmental cost.Our results reveal multiple hybrid ML/DFT strategies that optimize differentpoints along the accuracy--emissions front. We find that direct prediction ofscalar quantities, such as maximum efficiency, is significantly more tractablethan using predicted absorption spectra as an intermediate step. Interestingly,ML models trained on DFT data can outperform DFT workflows using alternativeexchange--correlation functionals in screening applications, highlighting theconsistency and utility of data-driven approaches. We also assess strategies toimprove ML-driven screening through expanded datasets and improved modelarchitectures tailored to PV-relevant features. This work provides aquantitative framework for building low-emission, high-throughput discoverypipelines.

 

Quick Read (beta)

loading the full paper ...