Dataset Culling: Towards Efficient Training Of Distillation-Based Domain Specific Models

  • 2019-02-01 04:23:32
  • Kentaro Yoshioka, Edward Lee, Simon Wong, Mark Horowitz
  • 36

Abstract

Real-time CNN based object detection models for applications likesurveillance can achieve high accuracy but require extensive computations.Recent work has shown 10 to 100x reduction in computation cost withdomain-specific network settings. However, this prior work focused on inferenceonly: if the domain network requires frequent retraining, training andretraining costs can be a significant bottleneck. To address training costs, wepropose Dataset Culling: a pipeline to significantly reduce the requiredtraining dataset size for domain specific models. Dataset Culling reduces thedataset size by filtering out non-essential data for train-ing, and reducingthe size of each image until detection degrades. Both of these operations use aconfusion loss metric which enables us to execute the culling with minimalcomputation overhead. On a custom long-duration dataset, we show that DatasetCulling can reduce the training costs 47x with no accuracy loss or even withslight improvements. Codes are available:https://github.com/kentaroy47/DatasetCulling

 

Quick Read (beta)

loading the full paper ...