Genshin: General Shield for Natural Language Processing with Large Language Models

Abstract

Large language models (LLMs) like ChatGPT, Gemini, or LLaMA have beentrending recently, demonstrating considerable advancement and generalizabilitypower in countless domains. However, LLMs create an even bigger black boxexacerbating opacity, with interpretability limited to few approaches. Theuncertainty and opacity embedded in LLMs' nature restrict their application inhigh-stakes domains like financial fraud, phishing, etc. Current approachesmainly rely on traditional textual classification with posterior interpretablealgorithms, suffering from attackers who may create versatile adversarialsamples to break the system's defense, forcing users to make trade-offs betweenefficiency and robustness. To address this issue, we propose a novel cascadingframework called Genshin (General Shield for Natural Language Processing withLarge Language Models), utilizing LLMs as defensive one-time plug-ins. Unlikemost applications of LLMs that try to transform text into something new orstructural, Genshin uses LLMs to recover text to its original state. Genshinaims to combine the generalizability of the LLM, the discrimination of themedian model, and the interpretability of the simple model. Our experiments onthe task of sentimental analysis and spam detection have shown fatal flaws ofthe current median models and exhilarating results on LLMs' recovery ability,demonstrating that Genshin is both effective and efficient. In our ablationstudy, we unearth several intriguing observations. Utilizing the LLM defender,a tool derived from the 4th paradigm, we have reproduced BERT's 15% optimalmask rate results in the 3rd paradigm of NLP. Additionally, when employing theLLM as a potential adversarial tool, attackers are capable of executingeffective attacks that are nearly semantically lossless.

Quick Read (beta)

loading the full paper ...