VolDoGer: LLM-assisted Datasets for Domain Generalization in Vision-Language Tasks

  • 2025-07-24 11:08:59
  • Juhwan Choi, Junehyoung Kwon, JungMin Yun, Seunguk Yu, YoungBin Kim
  • 0

Abstract

Domain generalizability is a crucial aspect of a deep learning model since itdetermines the capability of the model to perform well on data from unseendomains. However, research on the domain generalizability of deep learningmodels for vision-language tasks remains limited, primarily because of the lackof required datasets. To address these challenges, we propose VolDoGer:Vision-Language Dataset for Domain Generalization, a dedicated dataset designedfor domain generalization that addresses three vision-language tasks: imagecaptioning, visual question answering, and visual entailment. We constructedVolDoGer by extending LLM-based data annotation techniques to vision-languagetasks, thereby alleviating the burden of recruiting human annotators. Weevaluated the domain generalizability of various models, ranging fromfine-tuned models to a recent multimodal large language model, throughVolDoGer.

 

Quick Read (beta)

loading the full paper ...