Dataset Creation for Visual Entailment using Generative AI

Abstract

In this paper we present and validate a new synthetic dataset for trainingvisual entailment models. Existing datasets for visual entailment are small andsparse compared to datasets for textual entailment. Manually creating datasetsis labor-intensive. We base our synthetic dataset on the SNLI dataset fortextual entailment. We take the premise text from SNLI as input prompts in agenerative image model, Stable Diffusion, creating an image to replace eachtextual premise. We evaluate our dataset both intrinsically and extrinsically.For extrinsic evaluation, we evaluate the validity of the generated images byusing them as training data for a visual entailment classifier based on CLIPfeature vectors. We find that synthetic training data only leads to a slightdrop in quality on SNLI-VE, with an F-score 0.686 compared to 0.703 whentrained on real data. We also compare the quality of our generated trainingdata to original training data on another dataset: SICK-VTE. Again, there isonly a slight drop in F-score: from 0.400 to 0.384. These results indicate thatin settings with data sparsity, synthetic data can be a promising solution fortraining visual entailment models.

Quick Read (beta)

loading the full paper ...