An Empirical Study of Batch Normalization and Group Normalization in Conditional Computation

Abstract

Batch normalization has been widely used to improve optimization in deepneural networks. While the uncertainty in batch statistics can act as aregularizer, using these dataset statistics specific to the training setimpairs generalization in certain tasks. Recently, alternative methods fornormalizing feature activations in neural networks have been proposed. Amongthem, group normalization has been shown to yield similar, in some domains evensuperior performance to batch normalization. All these methods utilize alearned affine transformation after the normalization operation to increaserepresentational power. Methods used in conditional computation define theparameters of these transformations as learnable functions of conditioninginformation. In this work, we study whether and where the conditionalformulation of group normalization can improve generalization compared toconditional batch normalization. We evaluate performances on the tasks ofvisual question answering, few-shot learning, and conditional image generation.

Quick Read (beta)

loading the full paper ...