Abstract
There are two common ways in which developers are incorporating proprietaryand domain-specific data when building applications of Large Language Models(LLMs): Retrieval-Augmented Generation (RAG) and Fine-Tuning. RAG augments theprompt with the external data, while fine-Tuning incorporates the additionalknowledge into the model itself. However, the pros and cons of both approachesare not well understood. In this paper, we propose a pipeline for fine-tuningand RAG, and present the tradeoffs of both for multiple popular LLMs, includingLlama2-13B, GPT-3.5, and GPT-4. Our pipeline consists of multiple stages,including extracting information from PDFs, generating questions and answers,using them for fine-tuning, and leveraging GPT-4 for evaluating the results. Wepropose metrics to assess the performance of different stages of the RAG andfine-Tuning pipeline. We conduct an in-depth study on an agricultural dataset.Agriculture as an industry has not seen much penetration of AI, and we study apotentially disruptive application - what if we could provide location-specificinsights to a farmer? Our results show the effectiveness of our datasetgeneration pipeline in capturing geographic-specific knowledge, and thequantitative and qualitative benefits of RAG and fine-tuning. We see anaccuracy increase of over 6 p.p. when fine-tuning the model and this iscumulative with RAG, which increases accuracy by 5 p.p. further. In oneparticular experiment, we also demonstrate that the fine-tuned model leveragesinformation from across geographies to answer specific questions, increasinganswer similarity from 47% to 72%. Overall, the results point to how systemsbuilt using LLMs can be adapted to respond and incorporate knowledge across adimension that is critical for a specific industry, paving the way for furtherapplications of LLMs in other industrial domains.