Abstract
To reduce the toxic degeneration in a pretrained Language Model (LM),previous work on Language Model detoxification has focused on reducing thetoxicity of the generation itself (self-toxicity) without consideration of thecontext. As a result, a type of implicit offensive language where thegenerations support the offensive language in the context is ignored. Differentfrom the LM controlling tasks in previous work, where the desired attributesare fixed for generation, the desired stance of the generation depends on theoffensiveness of the context. Therefore, we propose a novel control method todo context-dependent detoxification with the stance taken into consideration.We introduce meta prefixes to learn the contextualized stance control strategyand to generate the stance control prefix according to the input context. Thegenerated stance prefix is then combined with the toxicity control prefix toguide the response generation. Experimental results show that our proposedmethod can effectively learn the context-dependent stance control strategieswhile keeping a low self-toxicity of the underlying LM.