Cross-Lingual Training for Automatic Question Generation

Abstract

Automatic question generation (QG) is a challenging problem in naturallanguage understanding. QG systems are typically built assuming access to alarge number of training instances where each instance is a question and itscorresponding answer. For a new language, such training instances are hard toobtain making the QG problem even more challenging. Using this as ourmotivation, we study the reuse of an available large QG dataset in a secondarylanguage (e.g. English) to learn a QG model for a primary language (e.g. Hindi)of interest. For the primary language, we assume access to a large amount ofmonolingual text but only a small QG dataset. We propose a cross-lingual QGmodel which uses the following training regime: (i) Unsupervised pretraining oflanguage models in both primary and secondary languages and (ii) jointsupervised training for QG in both languages. We demonstrate the efficacy ofour proposed approach using two different primary languages, Hindi and Chinese.We also create and release a new question answering dataset for Hindiconsisting of 6555 sentences.

Quick Read (beta)

loading the full paper ...