Language Model Supervision for Handwriting Recognition Model Adaptation

  • 2018-08-04 04:27:05
  • Chris Tensmeyer, Curtis Wigington, Brian Davis, Seth Stewart, Tony Martinez, William Barrett
  • 13

Abstract

Training state-of-the-art offline handwriting recognition (HWR) modelsrequires large labeled datasets, but unfortunately such datasets are notavailable in all languages and domains due to the high cost of manuallabeling.We address this problem by showing how high resource languages can beleveraged to help train models for low resource languages.We propose a transferlearning methodology where we adapt HWR models trained on a source language toa target language that uses the same writing script.This methodology onlyrequires labeled data in the source language, unlabeled data in the targetlanguage, and a language model of the target language. The language model isused in a bootstrapping fashion to refine predictions in the target languagefor use as ground truth in training the model.Using this approach wedemonstrate improved transferability among French, English, and Spanishlanguages using both historical and modern handwriting datasets. In the bestcase, transferring with the proposed methodology results in character errorrates nearly as good as full supervised training.

 

Quick Read (beta)

loading the full paper ...