Abstract
Automating information extraction from form-like documents at scale is apressing need due to its potential impact on automating business workflowsacross many industries like financial services, insurance, and healthcare. Thekey challenge is that form-like documents in these business workflows can belaid out in virtually infinitely many ways; hence, a good solution to thisproblem should generalize to documents with unseen layouts and languages. Asolution to this problem requires a holistic understanding of both the textualsegments and the visual cues within a document, which is non-trivial. While thenatural language processing and computer vision communities are starting totackle this problem, there has not been much focus on (1) data-efficiency, and(2) ability to generalize across different document types and languages. In this paper, we show that when we have only a small number of labeleddocuments for training (~50), a straightforward transfer learning approach froma considerably structurally-different larger labeled corpus yields up to a 27F1 point improvement over simply training on the small corpus in the targetdomain. We improve on this with a simple multi-domain transfer learningapproach, that is currently in production use, and show that this yields up toa further 8 F1 point improvement. We make the case that data efficiency iscritical to enable information extraction systems to scale to handle hundredsof different document-types, and learning good representations is critical toaccomplishing this.