Abstract
Large language models (LLMs) have demonstrated impressive capabilities acrossdiverse languages. This study explores how LLMs handle multilingualism. Basedon observed language ratio shifts among layers and the relationships betweennetwork structures and certain capabilities, we hypothesize the LLM'smultilingual workflow ($\texttt{MWork}$): LLMs initially understand the query,converting multilingual inputs into English for task-solving. In theintermediate layers, they employ English for thinking and incorporatemultilingual knowledge with self-attention and feed-forward structures,respectively. In the final layers, LLMs generate responses aligned with theoriginal language of the query. To verify $\texttt{MWork}$, we introduceParallel Language-specific Neuron Detection ($\texttt{PLND}$) to identifyactivated neurons for inputs in different languages without any labeled data.Using $\texttt{PLND}$, we validate $\texttt{MWork}$ through extensiveexperiments involving the deactivation of language-specific neurons acrossvarious layers and structures. Moreover, $\texttt{MWork}$ allows fine-tuning oflanguage-specific neurons with a small dataset, enhancing multilingualabilities in a specific language without compromising others. This approachresults in an average improvement of $3.6\%$ for high-resource languages and$2.3\%$ for low-resource languages across all tasks with just $400$ documents.