Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

Chinese/English mixed Character Segmentation as Semantic Segmentation

Nov 16, 2016
Huabin Zheng, Jingyu Wang, Zhengjie Huang, Yang Yang, Rong Pan

Share this with someone who'll enjoy it:

OCR character segmentation for multilingual printed documents is difficult due to the diversity of different linguistic characters. Previous approaches mainly focus on monolingual texts and are not suitable for multilingual-lingual cases. In this work, we particularly tackle the Chinese/English mixed case by reframing it as a semantic segmentation problem. We take advantage of the successful architecture called fully convolutional networks (FCN) in the field of semantic segmentation. Given a wide enough receptive field, FCN can utilize the necessary context around a horizontal position to determinate whether this is a splitting point or not. As a deep neural architecture, FCN can automatically learn useful features from raw text line images. Although trained on synthesized samples with simulated random disturbance, our FCN model generalizes well to real-world samples. The experimental results show that our model significantly outperforms the previous methods.

* Submitted to CVPR 2017 

   Access Paper Source

Share this with someone who'll enjoy it: