LoopTune: Optimizing Tensor Computations with Reinforcement Learning

Abstract

Advanced compiler technology is crucial for enabling machine learningapplications to run on novel hardware, but traditional compilers fail todeliver performance, popular auto-tuners have long search times andexpert-optimized libraries introduce unsustainable costs. To address this, wedeveloped LoopTune, a deep reinforcement learning compiler that optimizestensor computations in deep learning models for the CPU. LoopTune optimizestensor traversal order while using the ultra-fast lightweight code generatorLoopNest to perform hardware-specific optimizations. With a novel graph-basedrepresentation and action space, LoopTune speeds up LoopNest by 3.2x,generating an order of magnitude faster code than TVM, 2.8x faster thanMetaSchedule, and 1.08x faster than AutoTVM, consistently performing at thelevel of the hand-tuned library Numpy. Moreover, LoopTune tunes code in orderof seconds.

Quick Read (beta)

loading the full paper ...