Multilingual Contrastive Decoding via Language-Agnostic Layers Skipping

Abstract

Decoding by contrasting layers (DoLa), is designed to improve the generationquality of large language models (LLMs) by contrasting the predictionprobabilities between an early exit output (amateur logits) and the finaloutput (expert logits). However, we find that this approach does not work wellon non-English tasks. Inspired by previous interpretability work on languagetransition during the model's forward pass, we discover that this issue arisesfrom a language mismatch between early exit output and final output. In thiswork, we propose an improved contrastive decoding algorithm that is effectivefor diverse languages beyond English. To obtain more helpful amateur logits, wedevise two strategies to skip a set of bottom, language-agnostic layers basedon our preliminary analysis. Experimental results on multilingual reasoningbenchmarks demonstrate that our proposed method outperforms previouscontrastive decoding baselines and substantially improves LLM'schain-of-thought reasoning accuracy across 11 languages. The project will beavailable at: https://github.com/NJUNLP/SkipLayerCD.

Quick Read (beta)

loading the full paper ...