Visual Program Distillation with Template-Based Augmentation

  • 2025-05-25 07:38:41
  • Michal Shlapentokh-Rothman, Yu-Xiong Wang, Derek Hoiem
  • 0

Abstract

Adapting visual programming or prompting large language models (LLMs) togenerate executable code for visual tasks like visual question answering (VQA)for specialized tasks or domains remains challenging due to high annotation andinference costs. We propose a low-cost visual program distillation method thatcan be used for models with at most 1 billion parameters and requires nohuman-generated program annotations. We achieve this through synthetic dataaugmentation based on decoupling programs into higher-level skills, calledtemplates, and their corresponding arguments. Experimental results show that,with a relatively small amount of question/answer data, small language modelscan generate high-quality specialized visual programs with the added benefit ofmuch faster inference

 

Quick Read (beta)

loading the full paper ...