Abstract
Deep learning is a promising tool to determine the physical model thatdescribes our universe. To handle the considerable computational cost of thisproblem, we present CosmoFlow: a highly scalable deep learning applicationbuilt on top of the TensorFlow framework. CosmoFlow uses efficientimplementations of 3D convolution and pooling primitives, together withimprovements in threading for many element-wise operations, to improve trainingperformance on Intel(C) Xeon Phi(TM) processors. We also utilize the Cray PEMachine Learning Plugin for efficient scaling to multiple nodes. We demonstratefully synchronous data-parallel training on 8192 nodes of Cori with 77%parallel efficiency, achieving 3.5 Pflop/s sustained performance. To ourknowledge, this is the first large-scale science application of the TensorFlowframework at supercomputer scale with fully-synchronous training. Theseenhancements enable us to process large 3D dark matter distribution and predictthe cosmological parameters $\Omega_M$, $\sigma_8$ and n$_s$ with unprecedentedaccuracy.