Abstract
To date, toxicity mitigation in language models has almost entirely beenfocused on single-language settings. As language models embrace multilingualcapabilities, it's crucial our safety measures keep pace. Recognizing thisresearch gap, our approach expands the scope of conventional toxicitymitigation to address the complexities presented by multiple languages. In theabsence of sufficient annotated datasets across languages, we employ translateddata to evaluate and enhance our mitigation techniques. We also comparefinetuning mitigation approaches against retrieval-augmented techniques underboth static and continual toxicity mitigation scenarios. This allows us toexamine the effects of translation quality and the cross-lingual transfer ontoxicity mitigation. We also explore how model size and data quantity affectthe success of these mitigation efforts. Covering nine languages, our studyrepresents a broad array of linguistic families and levels of resourceavailability, ranging from high to mid-resource languages. Throughcomprehensive experiments, we provide insights into the complexities ofmultilingual toxicity mitigation, offering valuable insights and paving the wayfor future research in this increasingly important field. Code and data areavailable at https://github.com/for-ai/goodtriever.