Comparison of Interactive Knowledge Base Spelling Correction Models for Low-Resource Languages

Abstract

Spelling normalization for low resource languages is a challenging taskbecause the patterns are hard to predict and large corpora are usually requiredto collect enough examples. This work shows a comparison of a neural model andcharacter language models with varying amounts on target language data. Ourusage scenario is interactive correction with nearly zero amounts of trainingexamples, improving models as more data is collected, for example within a chatapp. Such models are designed to be incrementally improved as feedback is givenfrom users. In this work, we design a knowledge-base and prediction modelembedded system for spelling correction in low-resource languages. Experimentalresults on multiple languages show that the model could become effective with asmall amount of data. We perform experiments on both natural and syntheticdata, as well as on data from two endangered languages (Ainu and Griko). Last,we built a prototype system that was used for a small case study on Hinglish,which further demonstrated the suitability of our approach in real worldscenarios.

Quick Read (beta)

loading the full paper ...