Carbon Emissions and Large Neural Network Training

  • 2021-04-21 04:44:25
  • David Patterson, Joseph Gonzalez, Quoc Le, Chen Liang, Lluis-Miquel Munguia, Daniel Rothchild, David So, Maud Texier, Jeff Dean
  • 79

Abstract

The computation demand for machine learning (ML) has grown rapidly recently,which comes with a number of costs. Estimating the energy cost helps measureits environmental impact and finding greener strategies, yet it is challengingwithout detailed information. We calculate the energy use and carbon footprintof several recent large models-T5, Meena, GShard, Switch Transformer, andGPT-3-and refine earlier estimates for the neural architecture search thatfound Evolved Transformer. We highlight the following opportunities to improveenergy efficiency and CO2 equivalent emissions (CO2e): Large but sparselyactivated DNNs can consume <1/10th the energy of large, dense DNNs withoutsacrificing accuracy despite using as many or even more parameters. Geographiclocation matters for ML workload scheduling since the fraction of carbon-freeenergy and resulting CO2e vary ~5X-10X, even within the same country and thesame organization. We are now optimizing where and when large models aretrained. Specific datacenter infrastructure matters, as Cloud datacenters canbe ~1.4-2X more energy efficient than typical datacenters, and the ML-orientedaccelerators inside them can be ~2-5X more effective than off-the-shelfsystems. Remarkably, the choice of DNN, datacenter, and processor can reducethe carbon footprint up to ~100-1000X. These large factors also makeretroactive estimates of energy cost difficult. To avoid miscalculations, webelieve ML papers requiring large computational resources should make energyconsumption and CO2e explicit when practical. We are working to be moretransparent about energy use and CO2e in our future research. To help reducethe carbon footprint of ML, we believe energy usage and CO2e should be a keymetric in evaluating models, and we are collaborating with MLPerf developers toinclude energy usage during training and inference in this industry standardbenchmark.

 

Quick Read (beta)

loading the full paper ...