A well-known issue of Batch Normalization is its significantly reducedeffectiveness in the case of small mini-batch sizes. When a mini-batch containsfew examples, the statistics upon which the normalization is defined cannot bereliably estimated from it during a training iteration. To address thisproblem, we present Cross-Iteration Batch Normalization (CBN), in whichexamples from multiple recent iterations are jointly utilized to enhanceestimation quality. A challenge of computing statistics over multipleiterations is that the network activations from different iterations are notcomparable to each other due to changes in network weights. We thus compensatefor the network weight changes via a proposed technique based on Taylorpolynomials, so that the statistics can be accurately estimated and batchnormalization can be effectively applied. On object detection and imageclassification with small mini-batch sizes, CBN is found to outperform theoriginal batch normalization and a direct calculation of statistics overprevious iterations without the proposed compensation technique.