Instruction tuning has remarkably advanced large language models (LLMs) inunderstanding and responding to diverse human instructions. Despite the successin high-resource languages, its application in lower-resource ones faceschallenges due to the imbalanced foundational abilities of LLMs acrossdifferent languages, stemming from the uneven language distribution in theirpre-training data. To tackle this issue, we propose pivot language guidedgeneration (PLUG), an approach that utilizes a high-resource language,primarily English, as the pivot to enhance instruction tuning in lower-resourcelanguages. It trains the model to first process instructions in the pivotlanguage, and then produce responses in the target language. To evaluate ourapproach, we introduce a benchmark, X-AlpacaEval, of instructions in 4languages (Chinese, Korean, Italian, and Spanish), each annotated byprofessional translators. Our approach demonstrates a significant improvementin the instruction-following abilities of LLMs by 29% on average, compared todirectly responding in the target language alone. Further experiments validatethe versatility of our approach by employing alternative pivot languages beyondEnglish to assist languages where LLMs exhibit lower proficiency.