LLM-empowered Dynamic Prompt Routing for Vision-Language Models Tuning under Long-Tailed Distributions

  • 2025-08-21 16:12:06
  • Yongju Jia, Jiarui Ma, Xiangxian Li, Baiqiao Zhang, Xianhui Cao, Juan Liu, Yulong Bian
  • 0

Abstract

Pre-trained vision-language models (VLMs), such as CLIP, have demonstratedimpressive capability in visual tasks, but their fine-tuning often suffers frombias in class-imbalanced scene. Recent works have introduced large languagemodels (LLMs) to enhance VLM fine-tuning with supplementing semanticinformation. However, they often overlook inherent class imbalance in VLMs'pre-training, which may lead to bias accumulation in downstream tasks. Toaddress this problem, this paper proposes a Multi-dimensional Dynamic PromptRouting (MDPR) framework. MDPR constructs a comprehensive knowledge base forclasses, spanning five visual-semantic dimensions. During fine-tuning, thedynamic routing mechanism aligns global visual classes, retrieves optimalprompts, and balances fine-grained semantics, yielding stable predictionsthrough logits fusion. Extensive experiments on long-tailed benchmarks,including CIFAR-LT, ImageNet-LT, and Places-LT, demonstrate that MDPR achievescomparable results with current SOTA methods. Ablation studies further confirmthe effectiveness of our semantic library for tail classes, and show that ourdynamic routing incurs minimal computational overhead, making MDPR a flexibleand efficient enhancement for VLM fine-tuning under data imbalance.

 

Quick Read (beta)

loading the full paper ...