MobileVLM V2: Faster and Stronger Baseline for Vision Language Model

  • 2024-02-06 07:16:36
  • Xiangxiang Chu, Limeng Qiao, Xinyu Zhang, Shuang Xu, Fei Wei, Yang Yang, Xiaofei Sun, Yiming Hu, Xinyang Lin, Bo Zhang, Chunhua Shen
  • 0

Abstract

We introduce MobileVLM V2, a family of significantly improved vision languagemodels upon MobileVLM, which proves that a delicate orchestration of novelarchitectural design, an improved training scheme tailored for mobile VLMs, andrich high-quality dataset curation can substantially benefit VLMs' performance.Specifically, MobileVLM V2 1.7B achieves better or on-par performance onstandard VLM benchmarks compared with much larger VLMs at the 3B scale.Notably, our 3B model outperforms a large variety of VLMs at the 7B+ scale. Ourmodels will be released at https://github.com/Meituan-AutoML/MobileVLM .

 

Quick Read (beta)

loading the full paper ...