MLPerf Inference Benchmark

  • 2019-11-06 18:43:10
  • Vijay Janapa Reddi, Christine Cheng, David Kanter, Peter Mattson, Guenther Schmuelling, Carole-Jean Wu, Brian Anderson, Maximilien Breughe, Mark Charlebois, William Chou, Ramesh Chukka, Cody Coleman, Sam Davis, Pan Deng, Greg Diamos, Jared Duke, Dave Fick, J. Scott Gardner, Itay Hubara, Sachin Idgunji, Thomas B. Jablin, Jeff Jiao, Tom St. John, Pankaj Kanwar, David Lee, Jeffery Liao, Anton Lokhmotov, Francisco Massa, Peng Meng, Paulius Micikevicius, Colin Osborne, Gennady Pekhimenko, Arun Tejusve Raghunath Rajan, Dilip Sequeira, Ashish Sirasao, Fei Sun, Hanlin Tang, Michael Thomson, Frank Wei, Ephrem Wu, Lingjie Xu, Koichi Yamada, Bing Yu, George Yuan, Aaron Zhong, Peizhao Zhang, Yuchen Zhou
  • 5


Machine-learning (ML) hardware and software system demand is burgeoning.Driven by ML applications, the number of different ML inference systems hasexploded. Over 100 organizations are building ML inference chips, and thesystems that incorporate existing models span at least three orders ofmagnitude in power consumption and four orders of magnitude in performance;they range from embedded devices to data-center solutions. Fueling the hardwareare a dozen or more software frameworks and libraries. The myriad combinationsof ML hardware and ML software make assessing ML-system performance in anarchitecture-neutral, representative, and reproducible manner challenging.There is a clear need for industry-wide standard ML benchmarking and evaluationcriteria. MLPerf Inference answers that call. Driven by more than 30organizations as well as more than 200 ML engineers and practitioners, MLPerfimplements a set of rules and practices to ensure comparability across systemswith wildly differing architectures. In this paper, we present the method anddesign principles of the initial MLPerf Inference release. The first call forsubmissions garnered more than 600 inference-performance measurements from 14organizations, representing over 30 systems that show a range of capabilities.


Quick Read (beta)

loading the full paper ...