Lecture Video Visual Objects (LVVO) Dataset: A Benchmark for Visual Object Detection in Educational Videos

Abstract

We introduce the Lecture Video Visual Objects (LVVO) dataset, a new benchmarkfor visual object detection in educational video content. The dataset consistsof 4,000 frames extracted from 245 lecture videos spanning biology, computerscience, and geosciences. A subset of 1,000 frames, referred to as LVVO_1k, hasbeen manually annotated with bounding boxes for four visual categories: Table,Chart-Graph, Photographic-image, and Visual-illustration. Each frame waslabeled independently by two annotators, resulting in an inter-annotator F1score of 83.41%, indicating strong agreement. To ensure high-quality consensusannotations, a third expert reviewed and resolved all cases of disagreementthrough a conflict resolution process. To expand the dataset, a semi-supervisedapproach was employed to automatically annotate the remaining 3,000 frames,forming LVVO_3k. The complete dataset offers a valuable resource for developingand evaluating both supervised and semi-supervised methods for visual contentdetection in educational videos. The LVVO dataset is publicly available tosupport further research in this domain.

Quick Read (beta)

loading the full paper ...