CryptoScope: Utilizing Large Language Models for Automated Cryptographic Logic Vulnerability Detection

Abstract

Cryptographic algorithms are fundamental to modern security, yet theirimplementations frequently harbor subtle logic flaws that are hard to detect.We introduce CryptoScope, a novel framework for automated cryptographicvulnerability detection powered by Large Language Models (LLMs). CryptoScopecombines Chain-of-Thought (CoT) prompting with Retrieval-Augmented Generation(RAG), guided by a curated cryptographic knowledge base containing over 12,000entries. We evaluate CryptoScope on LLM-CLVA, a benchmark of 92 cases primarilyderived from real-world CVE vulnerabilities, complemented by cryptographicchallenges from major Capture The Flag (CTF) competitions and syntheticexamples across 11 programming languages. CryptoScope consistently improvesperformance over strong LLM baselines, boosting DeepSeek-V3 by 11.62%,GPT-4o-mini by 20.28%, and GLM-4-Flash by 28.69%. Additionally, it identifies 9previously undisclosed flaws in widely used open-source cryptographic projects.

Quick Read (beta)

loading the full paper ...