Softmax loss is arguably one of the most popular losses to train CNN modelsfor image classification. However, recent works have exposed its limitation onfeature discriminability. This paper casts a new viewpoint on the weakness ofsoftmax loss. On the one hand, the CNN features learned using the softmax lossare often inadequately discriminative. We hence introduce a soft-margin softmaxfunction to explicitly encourage the discrimination between different classes.On the other hand, the learned classifier of softmax loss is weak. We proposeto assemble multiple these weak classifiers to a strong one, inspired by therecognition that the diversity among weak classifiers is critical to a goodensemble. To achieve the diversity, we adopt the Hilbert-Schmidt IndependenceCriterion (HSIC). Considering these two aspects in one framework, we design anovel loss, named as Ensemble soft-Margin Softmax (EM-Softmax). Extensiveexperiments on benchmark datasets are conducted to show the superiority of ourdesign over the baseline softmax loss and several state-of-the-artalternatives.