MultiZoo & MultiBench: A Standardized Toolkit for Multimodal Deep Learning

Abstract

Learning multimodal representations involves integrating information frommultiple heterogeneous sources of data. In order to accelerate progress towardsunderstudied modalities and tasks while ensuring real-world robustness, werelease MultiZoo, a public toolkit consisting of standardized implementationsof > 20 core multimodal algorithms and MultiBench, a large-scale benchmarkspanning 15 datasets, 10 modalities, 20 prediction tasks, and 6 research areas.Together, these provide an automated end-to-end machine learning pipeline thatsimplifies and standardizes data loading, experimental setup, and modelevaluation. To enable holistic evaluation, we offer a comprehensive methodologyto assess (1) generalization, (2) time and space complexity, and (3) modalityrobustness. MultiBench paves the way towards a better understanding of thecapabilities and limitations of multimodal models, while ensuring ease of use,accessibility, and reproducibility. Our toolkits are publicly available, willbe regularly updated, and welcome inputs from the community.

Quick Read (beta)

loading the full paper ...