Abstract
Recently, self-normalizing neural networks (SNNs) have been proposed with theintention to avoid batch or weight normalization. The key step in SNNs is toproperly scale the exponential linear unit (referred to as SELU) to inherentlyincorporate normalization based on central limit theory. SELU is amonotonically increasing function, where it has an approximately constantnegative output for large negative input. In this work, we propose a newactivation function to break the monotonicity property of SELU while stillpreserving the self-normalizing property. Differently from SELU, the newfunction introduces a bump-shaped function in the region of negative input byregularizing a linear function with a scaled exponential function, which isreferred to as a scaled exponentially-regularized linear unit (SERLU). Thebump-shaped function has approximately zero response to large negative inputwhile being able to push the output of SERLU towards zero mean statistically.To effectively combat over-fitting, we develop a so-called shift-dropout forSERLU, which includes standard dropout as a special case. Experimental resultson MNIST, CIFAR10 and CIFAR100 show that SERLU-based neural networks provideconsistently promising results in comparison to other 5 activation functionsincluding ELU, SELU, Swish, Leakly ReLU and ReLU.