PP-OCR: A Practical Ultra Lightweight OCR System

  • 2020-09-21 14:57:18
  • Yuning Du, Chenxia Li, Ruoyu Guo, Xiaoting Yin, Weiwei Liu, Jun Zhou, Yifan Bai, Zilin Yu, Yehua Yang, Qingqing Dang, Haoshuang Wang
  • 33

Abstract

The Optical Character Recognition (OCR) systems have been widely used invarious of application scenarios, such as office automation (OA) systems,factory automations, online educations, map productions etc. However, OCR isstill a challenging task due to the various of text appearances and the demandof computational efficiency. In this paper, we propose a practical ultralightweight OCR system, i.e., PP-OCR. The overall model size of the PP-OCR isonly 3.5M for recognizing 6622 Chinese characters and 2.8M for recognizing 63alphanumeric symbols, respectively. We introduce a bag of strategies to eitherenhance the model ability or reduce the model size. The corresponding ablationexperiments with the real data are also provided. Meanwhile, severalpre-trained models for the Chinese and English recognition are released,including a text detector (97K images are used), a direction classifier (600Kimages are used) as well as a text recognizer (17.9M images are used). Besides,the proposed PP-OCR are also verified in several other language recognitiontasks, including French, Korean, Japanese and German. All of the abovementioned models are open-sourced and the codes are available in the GitHubrepository, i.e., https://github.com/PaddlePaddle/PaddleOCR.

 

Quick Read (beta)

loading the full paper ...