Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Chris Tensmeyer

Convolutional Neural Networks for Font Classification

Aug 11, 2017

Chris Tensmeyer, Daniel Saunders, Tony Martinez

Figure 1 for Convolutional Neural Networks for Font Classification

Figure 2 for Convolutional Neural Networks for Font Classification

Figure 3 for Convolutional Neural Networks for Font Classification

Figure 4 for Convolutional Neural Networks for Font Classification

Abstract:Classifying pages or text lines into font categories aids transcription because single font Optical Character Recognition (OCR) is generally more accurate than omni-font OCR. We present a simple framework based on Convolutional Neural Networks (CNNs), where a CNN is trained to classify small patches of text into predefined font classes. To classify page or line images, we average the CNN predictions over densely extracted patches. We show that this method achieves state-of-the-art performance on a challenging dataset of 40 Arabic computer fonts with 98.8\% line level accuracy. This same method also achieves the highest reported accuracy of 86.6% in predicting paleographic scribal script classes at the page level on medieval Latin manuscripts. Finally, we analyze what features are learned by the CNN on Latin manuscripts and find evidence that the CNN is learning both the defining morphological differences between scribal script classes as well as overfitting to class-correlated nuisance factors. We propose a novel form of data augmentation that improves robustness to text darkness, further increasing classification performance.

* ICDAR 2017

Via

Access Paper or Ask Questions

Document Image Binarization with Fully Convolutional Neural Networks

Aug 10, 2017

Chris Tensmeyer, Tony Martinez

Figure 1 for Document Image Binarization with Fully Convolutional Neural Networks

Figure 2 for Document Image Binarization with Fully Convolutional Neural Networks

Figure 3 for Document Image Binarization with Fully Convolutional Neural Networks

Figure 4 for Document Image Binarization with Fully Convolutional Neural Networks

Abstract:Binarization of degraded historical manuscript images is an important pre-processing step for many document processing tasks. We formulate binarization as a pixel classification learning task and apply a novel Fully Convolutional Network (FCN) architecture that operates at multiple image scales, including full resolution. The FCN is trained to optimize a continuous version of the Pseudo F-measure metric and an ensemble of FCNs outperform the competition winners on 4 of 7 DIBCO competitions. This same binarization technique can also be applied to different domains such as Palm Leaf Manuscripts with good performance. We analyze the performance of the proposed model w.r.t. the architectural hyperparameters, size and diversity of training data, and the input features chosen.

* ICDAR 2017 (oral)

Via

Access Paper or Ask Questions

Analysis of Convolutional Neural Networks for Document Image Classification

Aug 10, 2017

Chris Tensmeyer, Tony Martinez

Figure 1 for Analysis of Convolutional Neural Networks for Document Image Classification

Figure 2 for Analysis of Convolutional Neural Networks for Document Image Classification

Figure 3 for Analysis of Convolutional Neural Networks for Document Image Classification

Figure 4 for Analysis of Convolutional Neural Networks for Document Image Classification

Abstract:Convolutional Neural Networks (CNNs) are state-of-the-art models for document image classification tasks. However, many of these approaches rely on parameters and architectures designed for classifying natural images, which differ from document images. We question whether this is appropriate and conduct a large empirical study to find what aspects of CNNs most affect performance on document images. Among other results, we exceed the state-of-the-art on the RVL-CDIP dataset by using shear transform data augmentation and an architecture designed for a larger input image. Additionally, we analyze the learned features and find evidence that CNNs trained on RVL-CDIP learn region-specific layout features.

* Accepted ICDAR 2017

Via

Access Paper or Ask Questions