SportsSloMo: A New Benchmark and Baselines for Human-centric Video Frame Interpolation

  • 2023-08-31 18:23:50
  • Jiaben Chen, Huaizu Jiang
  • 0

Abstract

Human-centric video frame interpolation has great potential for improvingpeople's entertainment experiences and finding commercial applications in thesports analysis industry, e.g., synthesizing slow-motion videos. Although thereare multiple benchmark datasets available in the community, none of them isdedicated for human-centric scenarios. To bridge this gap, we introduceSportsSloMo, a benchmark consisting of more than 130K video clips and 1M videoframes of high-resolution ($\geq$720p) slow-motion sports videos crawled fromYouTube. We re-train several state-of-the-art methods on our benchmark, and theresults show a decrease in their accuracy compared to other datasets. Ithighlights the difficulty of our benchmark and suggests that it posessignificant challenges even for the best-performing methods, as human bodiesare highly deformable and occlusions are frequent in sports videos. To improvethe accuracy, we introduce two loss terms considering the human-aware priors,where we add auxiliary supervision to panoptic segmentation and human keypointsdetection, respectively. The loss terms are model agnostic and can be easilyplugged into any video frame interpolation approaches. Experimental resultsvalidate the effectiveness of our proposed loss terms, leading to consistentperformance improvement over 5 existing models, which establish strong baselinemodels on our benchmark. The dataset and code can be found at:https://neu-vi.github.io/SportsSlomo/.

 

Quick Read (beta)

loading the full paper ...