Retriv at BLP-2025 Task 2: Test-Driven Feedback-Guided Framework for Bangla-to-Python Code Generation

Abstract

Large Language Models (LLMs) have advanced the automated generation of codefrom natural language prompts. However, low-resource languages (LRLs) likeBangla remain underrepresented due to the limited availability ofinstruction-to-code datasets and evaluation benchmarks. To address this, theBLP Workshop at IJCNLP-AACL 2025 introduced a shared task on "Code Generationin Bangla". In this work, we propose a method that combines instructionprompting with a test-driven, feedback-guided iterative refinement processusing a fine-tuned Qwen2.5-14B model. The model generates code from Banglainstructions, tests it against unit tests, and iteratively refines any failingoutputs through three evaluation passes, using test feedback to guide eachstep. This approach helped our team "Retriv" to secure 2nd place in the sharedtask with a Pass@1 score of 0.934. The analysis highlights challenges in Banglainstruction understanding and Python code generation, emphasizing the need fortargeted methods in LRLs. We made experimental scripts publicly available forthe community.

Quick Read (beta)

loading the full paper ...