VLMEvalKit: An Open-Source Toolkit for Evaluating Large Multi-Modality Models

  • 2024-09-11 18:10:36
  • Haodong Duan, Junming Yang, Yuxuan Qiao, Xinyu Fang, Lin Chen, Yuan Liu, Amit Agarwal, Zhe Chen, Mo Li, Yubo Ma, Hailong Sun, Xiangyu Zhao, Junbo Cui, Xiaoyi Dong, Yuhang Zang, Pan Zhang, Jiaqi Wang, Dahua Lin, Kai Chen
  • 0

Abstract

We present VLMEvalKit: an open-source toolkit for evaluating largemulti-modality models based on PyTorch. The toolkit aims to provide auser-friendly and comprehensive framework for researchers and developers toevaluate existing multi-modality models and publish reproducible evaluationresults. In VLMEvalKit, we implement over 70 different large multi-modalitymodels, including both proprietary APIs and open-source models, as well as morethan 20 different multi-modal benchmarks. By implementing a single interface,new models can be easily added to the toolkit, while the toolkit automaticallyhandles the remaining workloads, including data preparation, distributedinference, prediction post-processing, and metric calculation. Although thetoolkit is currently mainly used for evaluating large vision-language models,its design is compatible with future updates that incorporate additionalmodalities, such as audio and video. Based on the evaluation results obtainedwith the toolkit, we host OpenVLM Leaderboard, a comprehensive leaderboard totrack the progress of multi-modality learning research. The toolkit is releasedat https://github.com/open-compass/VLMEvalKit and is actively maintained.

 

Quick Read (beta)

loading the full paper ...