Doc2Im: document to image conversion through self-attentive embedding

Abstract

Text classification is a fundamental task in NLP applications. Latestresearch in this field has largely been divided into two major sub-fields.Learning representations is one sub-field and learning deeper models, bothsequential and convolutional, which again connects back to the representationis the other side. We posit the idea that the stronger the representation is,the simpler classifier models are needed to achieve higher performance. In thispaper we propose a completely novel direction to text classification research,wherein we convert text to a representation very similar to images, such thatany deep network able to handle images is equally able to handle text. We takea deeper look at the representation of documents as an image and subsequentlyutilize very simple convolution based models taken as is from computer visiondomain. This image can be cropped, re-scaled, re-sampled and augmented justlike any other image to work with most of the state-of-the-art largeconvolution based models which have been designed to handle large imagedatasets. We show impressive results with some of the latest benchmarks in therelated fields. We perform transfer learning experiments, both from text totext domain and also from image to text domain. We believe this is a paradigmshift from the way document understanding and text classification has beentraditionally done, and will drive numerous novel research ideas in thecommunity.

Quick Read (beta)

loading the full paper ...