Contextual Sense Making by Fusing Scene Classification, Detections, and Events in Full Motion Video

Abstract

With the proliferation of imaging sensors, the volume of multi-modal imageryfar exceeds the ability of human analysts to adequately consume and exploit it.Full motion video (FMV) possesses the extra challenge of containing largeamounts of redundant temporal data. We aim to address the needs of humananalysts to consume and exploit data given aerial FMV. We have investigated anddesigned a system capable of detecting events and activities of interest thatdeviate from the baseline patterns of observation given FMV feeds. We havedivided the problem into three tasks: (1) Context awareness, (2) objectcataloging, and (3) event detection. The goal of context awareness is toconstraint the problem of visual search and detection in video data. A customimage classifier categorizes the scene with one or multiple labels to identifythe operating context and environment. This step helps reducing the semanticsearch space of downstream tasks in order to increase their accuracy. Thesecond step is object cataloging, where an ensemble of object detectors locatesand labels any known objects found in the scene (people, vehicles, boats,planes, buildings, etc.). Finally, context information and detections are sentto the event detection engine to monitor for certain behaviors. A series ofanalytics monitor the scene by tracking object counts, and object interactions.If these object interactions are not declared to be commonly observed in thecurrent scene, the system will report, geolocate, and log the event. Events ofinterest include identifying a gathering of people as a meeting and/or a crowd,alerting when there are boats on a beach unloading cargo, increased count ofpeople entering a building, people getting in and/or out of vehicles ofinterest, etc. We have applied our methods on data from different sensors atdifferent resolutions in a variety of geographical areas.

Quick Read (beta)

loading the full paper ...