Finding Person Relations in Image Data of the Internet Archive

  • 2018-06-21 13:48:21
  • Eric Müller-Budack, Kader Pustu-Iren, Sebastian Diering, Ralph Ewerth
  • 11

Abstract

The multimedia content in the World Wide Web is rapidly growing and containsvaluable information for many applications in different domains. For thisreason, the Internet Archive initiative has been gathering billions oftime-versioned web pages since the mid-nineties. However, the huge amount ofdata is rarely labeled with appropriate metadata and automatic approaches arerequired to enable semantic search. Normally, the textual content of theInternet Archive is used to extract entities and their possible relationsacross domains such as politics and entertainment, whereas image and videocontent is usually neglected. In this paper, we introduce a system for personrecognition in image content of web news stored in the Internet Archive. Thus,the system complements entity recognition in text and allows researchers andanalysts to track media coverage and relations of persons more precisely. Basedon a deep learning face recognition approach, we suggest a system thatautomatically detects persons of interest and gathers sample material, which issubsequently used to identify them in the image data of the Internet Archive.We evaluate the performance of the face recognition system on an appropriatestandard benchmark dataset and demonstrate the feasibility of the approach withtwo use cases.

 

Quick Read (beta)

loading the full paper ...