Abstract
Cryptographic algorithms are fundamental to modern security, yet theirimplementations frequently harbor subtle logic flaws that are hard to detect.We introduce CryptoScope, a novel framework for automated cryptographicvulnerability detection powered by Large Language Models (LLMs). CryptoScopecombines Chain-of-Thought (CoT) prompting with Retrieval-Augmented Generation(RAG), guided by a curated cryptographic knowledge base containing over 12,000entries. We evaluate CryptoScope on LLM-CLVA, a benchmark of 92 cases primarilyderived from real-world CVE vulnerabilities, complemented by cryptographicchallenges from major Capture The Flag (CTF) competitions and syntheticexamples across 11 programming languages. CryptoScope consistently improvesperformance over strong LLM baselines, boosting DeepSeek-V3 by 11.62%,GPT-4o-mini by 20.28%, and GLM-4-Flash by 28.69%. Additionally, it identifies 9previously undisclosed flaws in widely used open-source cryptographic projects.