Towards Enhanced RAC Accessibility: Leveraging Datasets and LLMs

  • 2024-05-14 18:41:07
  • Edison Jair Bejarano Sepulveda, Nicolai Potes Hector, Santiago Pineda Montoya, Felipe Ivan Rodriguez, Jaime Enrique Orduy, Alec Rosales Cabezas, Danny TraslaviƱa Navarrete, Sergio Madrid Farfan
  • 0

Abstract

This paper explores the potential of large language models (LLMs) to make theAeronautical Regulations of Colombia (RAC) more accessible. Given thecomplexity and extensive technicality of the RAC, this study introduces a novelapproach to simplifying these regulations for broader understanding. Bydeveloping the first-ever RAC database, which contains 24,478 expertly labeledquestion-and-answer pairs, and fine-tuning LLMs specifically for RACapplications, the paper outlines the methodology for dataset assembly,expert-led annotation, and model training. Utilizing the Gemma1.1 2b modelalong with advanced techniques like Unsloth for efficient VRAM usage and flashattention mechanisms, the research aims to expedite training processes. Thisinitiative establishes a foundation to enhance the comprehensibility andaccessibility of RAC, potentially benefiting novices and reducing dependence onexpert consultations for navigating the aviation industry's regulatorylandscape. You can visit the dataset(https://huggingface.co/somosnlp/gemma-1.1-2b-it_ColombiaRAC_FullyCurated_format_chatML_V1)and the model(https://huggingface.co/datasets/somosnlp/ColombiaRAC_FullyCurated) here.

 

Quick Read (beta)

loading the full paper ...