MSVCOD:A Large-Scale Multi-Scene Dataset for Video Camouflage Object Detection

Abstract

Video Camouflaged Object Detection (VCOD) is a challenging task which aims toidentify objects that seamlessly concealed within the background in videos. Thedynamic properties of video enable detection of camouflaged objects throughmotion cues or varied perspectives. Previous VCOD datasets primarily containanimal objects, limiting the scope of research to wildlife scenarios. However,the applications of VCOD extend beyond wildlife and have significantimplications in security, art, and medical fields. Addressing this problem, weconstruct a new large-scale multi-domain VCOD dataset MSVCOD. To achievehigh-quality annotations, we design a semi-automatic iterative annotationpipeline that reduces costs while maintaining annotation accuracy. Our MSVCODis the largest VCOD dataset to date, introducing multiple object categoriesincluding human, animal, medical, and vehicle objects for the first time, whilealso expanding background diversity across various environments. This expandedscope increases the practical applicability of the VCOD task in camouflagedobject detection. Alongside this dataset, we introduce a one-steam videocamouflage object detection model that performs both feature extraction andinformation fusion without additional motion feature fusion modules. Ourframework achieves state-of-the-art results on the existing VCOD animal datasetand the proposed MSVCOD. The dataset and code will be made publicly available.

Quick Read (beta)

loading the full paper ...