Qwen-VL: A Frontier Large Vision-Language Model with Versatile Abilities

  • 2023-08-24 18:59:17
  • Jinze Bai, Shuai Bai, Shusheng Yang, Shijie Wang, Sinan Tan, Peng Wang, Junyang Lin, Chang Zhou, Jingren Zhou
  • 0

Abstract

We introduce the Qwen-VL series, a set of large-scale vision-language modelsdesigned to perceive and understand both text and images. Comprising Qwen-VLand Qwen-VL-Chat, these models exhibit remarkable performance in tasks likeimage captioning, question answering, visual localization, and flexibleinteraction. The evaluation covers a wide range of tasks including zero-shotcaptioning, visual or document visual question answering, and grounding. Wedemonstrate the Qwen-VL outperforms existing Large Vision Language Models(LVLMs). We present their architecture, training, capabilities, andperformance, highlighting their contributions to advancing multimodalartificial intelligence. Code, demo and models are available athttps://github.com/QwenLM/Qwen-VL.

 

Quick Read (beta)

loading the full paper ...