Investigation of Large-Margin Softmax in Neural Language Modeling

Abstract

To encourage intra-class compactness and inter-class separability amongtrainable feature vectors, large-margin softmax methods are developed andwidely applied in the face recognition community. The introduction of thelarge-margin concept into the softmax is reported to have good properties suchas enhanced discriminative power, less overfitting and well-defined geometricintuitions. Nowadays, language modeling is commonly approached with neuralnetworks using softmax and cross entropy. In this work, we are curious to seeif introducing large-margins to neural language models would improve theperplexity and consequently word error rate in automatic speech recognition.Specifically, we first implement and test various types of conventional marginsfollowing the previous works in face recognition. To address the distributionof natural language data, we then compare different strategies for word vectornorm-scaling. After that, we apply the best norm-scaling setup in combinationwith various margins and conduct neural language models rescoring experimentsin automatic speech recognition. We find that although perplexity is slightlydeteriorated, neural language models with large-margin softmax can yield worderror rate similar to that of the standard softmax baseline. Finally, expectedmargins are analyzed through visualization of word vectors, showing that thesyntactic and semantic relationships are also preserved.

Quick Read (beta)

loading the full paper ...