DetoxBench: Benchmarking Large Language Models for Multitask Fraud & Abuse Detection

Abstract

Large language models (LLMs) have demonstrated remarkable capabilities innatural language processing tasks. However, their practical application inhigh-stake domains, such as fraud and abuse detection, remains an area thatrequires further exploration. The existing applications often narrowly focus onspecific tasks like toxicity or hate speech detection. In this paper, wepresent a comprehensive benchmark suite designed to assess the performance ofLLMs in identifying and mitigating fraudulent and abusive language acrossvarious real-world scenarios. Our benchmark encompasses a diverse set of tasks,including detecting spam emails, hate speech, misogynistic language, and more.We evaluated several state-of-the-art LLMs, including models from Anthropic,Mistral AI, and the AI21 family, to provide a comprehensive assessment of theircapabilities in this critical domain. The results indicate that while LLMsexhibit proficient baseline performance in individual fraud and abuse detectiontasks, their performance varies considerably across tasks, particularlystruggling with tasks that demand nuanced pragmatic reasoning, such asidentifying diverse forms of misogynistic language. These findings haveimportant implications for the responsible development and deployment of LLMsin high-risk applications. Our benchmark suite can serve as a tool forresearchers and practitioners to systematically evaluate LLMs for multi-taskfraud detection and drive the creation of more robust, trustworthy, andethically-aligned systems for fraud and abuse detection.

Quick Read (beta)

loading the full paper ...