SSP: Self-Supervised Prompting for Cross-Lingual Transfer to Low-Resource Languages using Large Language Models

Abstract

Recently, very large language models (LLMs) have shown exceptionalperformance on several English NLP tasks with just in-context learning (ICL),but their utility in other languages is still underexplored. We investigatetheir effectiveness for NLP tasks in low-resource languages (LRLs), especiallyin the setting of zero-labelled cross-lingual transfer (0-CLT), where nolabelled training data for the target language is available -- however trainingdata from one or more related medium-resource languages (MRLs) is utilized,alongside the available unlabeled test data for a target language. We introduceSelf-Supervised Prompting (SSP), a novel ICL approach tailored for the 0-CLTsetting. SSP is based on the key observation that LLMs output more accurate labels ifin-context exemplars are from the target language (even if their labels areslightly noisy). To operationalize this, since target language training data isnot available in 0-CLT, SSP operates in two stages. In Stage I, using sourceMRL training data, target language's test data is noisily labeled. In Stage II,these noisy test data points are used as exemplars in ICL for further improvedlabelling. Additionally, our implementation of SSP uses a novel Integer LinearProgramming (ILP)-based exemplar selection that balances similarity, predictionconfidence (when available) and label coverage. Experiments on three tasks andeleven LRLs (from three regions) demonstrate that SSP strongly outperformsexisting SOTA fine-tuned and prompting-based baselines in 0-CLT setup.

Quick Read (beta)

loading the full paper ...