Abstract
Visual Programming (VP) has emerged as a powerful framework for VisualQuestion Answering (VQA). By generating and executing bespoke code for eachquestion, these methods demonstrate impressive compositional and reasoningcapabilities, especially in few-shot and zero-shot scenarios. However, existingVP methods generate all code in a single function, resulting in code that issuboptimal in terms of both accuracy and interpretability. Inspired by humancoding practices, we propose Recursive Visual Programming (RVP), whichsimplifies generated routines, provides more efficient problem solving, and canmanage more complex data structures. RVP is inspired by human coding practicesand approaches VQA tasks with an iterative recursive code generation approach,allowing decomposition of complicated problems into smaller parts. Notably, RVPis capable of dynamic type assignment, i.e., as the system recursivelygenerates a new piece of code, it autonomously determines the appropriatereturn type and crafts the requisite code to generate that output. We showRVP's efficacy through extensive experiments on benchmarks including VSR, COVR,GQA, and NextQA, underscoring the value of adopting human-like recursive andmodular programming techniques for solving VQA tasks through coding.