Abstract
Machine-learning (ML) hardware and software system demand is burgeoning.Driven by ML applications, the number of different ML inference systems hasexploded. Over 100 organizations are building ML inference chips, and thesystems that incorporate existing models span at least three orders ofmagnitude in power consumption and four orders of magnitude in performance;they range from embedded devices to data-center solutions. Fueling the hardwareare a dozen or more software frameworks and libraries. The myriad combinationsof ML hardware and ML software make assessing ML-system performance in anarchitecture-neutral, representative, and reproducible manner challenging.There is a clear need for industry-wide standard ML benchmarking and evaluationcriteria. MLPerf Inference answers that call. Driven by more than 30organizations as well as more than 200 ML engineers and practitioners, MLPerfimplements a set of rules and practices to ensure comparability across systemswith wildly differing architectures. In this paper, we present the method anddesign principles of the initial MLPerf Inference release. The first call forsubmissions garnered more than 600 inference-performance measurements from 14organizations, representing over 30 systems that show a range of capabilities.