Text classification with pixel embedding

Abstract

We propose a novel framework to understand the text by converting sentencesor articles into video-like 3-dimensional tensors. Each frame, corresponding toa slice of the tensor, is a word image that is rendered by the word's shape.The length of the tensor equals to the number of words in the sentence orarticle. The proposed transformation from the text to a 3-dimensional tensormakes it very convenient to implement an $n$-gram model with convolutionalneural networks for text analysis. Concretely, we impose a 3-dimensionalconvolutional kernel on the 3-dimensional text tensor. The first two dimensionsof the convolutional kernel size equal the size of the word image and the lastdimension of the kernel size is $n$. That is, every time when we slide the3-dimensional kernel over a word sequence, the convolution covers $n$ wordimages and outputs a scalar. By iterating this process continuously for each$n$-gram along with the sentence or article with multiple kernels, we obtain a2-dimensional feature map. A subsequent 1-dimensional max-over-time pooling isapplied to this feature map, and three fully-connected layers are used forconducting text classification finally. Experiments of several textclassification datasets demonstrate surprisingly superior performances usingthe proposed model in comparison with existing methods.

Quick Read (beta)

loading the full paper ...