Automating Pharmacovigilance Evidence Generation: Using Large Language Models to Produce Context-Aware SQL

Abstract

Objective: To enhance the efficiency and accuracy of information retrievalfrom pharmacovigilance (PV) databases by employing Large Language Models (LLMs)to convert natural language queries (NLQs) into Structured Query Language (SQL)queries, leveraging a business context document. Materials and Methods: We utilized OpenAI's GPT-4 model within aretrieval-augmented generation (RAG) framework, enriched with a businesscontext document, to transform NLQs into syntactically precise SQL queries.Each NLQ was presented to the LLM randomly and independently to preventmemorization. The study was conducted in three phases, varying querycomplexity, and assessing the LLM's performance both with and without thebusiness context document. Results: Our approach significantly improved NLQ-to-SQL accuracy, increasingfrom 8.3\% with the database schema alone to 78.3\% with the business contextdocument. This enhancement was consistent across low, medium, and highcomplexity queries, indicating the critical role of contextual knowledge inquery generation. Discussion: The integration of a business context document markedly improvedthe LLM's ability to generate accurate and contextually relevant SQL queries.Performance achieved a maximum of 85\% when high complexity queries areexcluded, suggesting promise for routine deployment. Conclusion: This study presents a novel approach to employing LLMs for safetydata retrieval and analysis, demonstrating significant advancements in querygeneration accuracy. The methodology offers a framework applicable to variousdata-intensive domains, enhancing the accessibility and efficiency ofinformation retrieval for non-technical users.

Quick Read (beta)

loading the full paper ...