CPSDBench: A Large Language Model Evaluation Benchmark and Baseline for Chinese Public Security Domain

Abstract

Large Language Models (LLMs) have demonstrated significant potential andeffectiveness across multiple application domains. To assess the performance ofmainstream LLMs in public security tasks, this study aims to construct aspecialized evaluation benchmark tailored to the Chinese public securitydomain--CPSDbench. CPSDbench integrates datasets related to public securitycollected from real-world scenarios, supporting a comprehensive assessment ofLLMs across four key dimensions: text classification, information extraction,question answering, and text generation. Furthermore, this study introduces aset of innovative evaluation metrics designed to more precisely quantify theefficacy of LLMs in executing tasks related to public security. Through thein-depth analysis and evaluation conducted in this research, we not onlyenhance our understanding of the performance strengths and limitations ofexisting models in addressing public security issues but also providereferences for the future development of more accurate and customized LLMmodels targeted at applications in this field.

Quick Read (beta)

loading the full paper ...