Alert button
Picture for Chris Tensmeyer

Chris Tensmeyer

Alert button

Language Model Supervision for Handwriting Recognition Model Adaptation

Aug 04, 2018
Chris Tensmeyer, Curtis Wigington, Brian Davis, Seth Stewart, Tony Martinez, William Barrett

Figure 1 for Language Model Supervision for Handwriting Recognition Model Adaptation
Figure 2 for Language Model Supervision for Handwriting Recognition Model Adaptation
Figure 3 for Language Model Supervision for Handwriting Recognition Model Adaptation

Training state-of-the-art offline handwriting recognition (HWR) models requires large labeled datasets, but unfortunately such datasets are not available in all languages and domains due to the high cost of manual labeling.We address this problem by showing how high resource languages can be leveraged to help train models for low resource languages.We propose a transfer learning methodology where we adapt HWR models trained on a source language to a target language that uses the same writing script.This methodology only requires labeled data in the source language, unlabeled data in the target language, and a language model of the target language. The language model is used in a bootstrapping fashion to refine predictions in the target language for use as ground truth in training the model.Using this approach we demonstrate improved transferability among French, English, and Spanish languages using both historical and modern handwriting datasets. In the best case, transferring with the proposed methodology results in character error rates nearly as good as full supervised training.

Viaarxiv icon

PageNet: Page Boundary Extraction in Historical Handwritten Documents

Sep 05, 2017
Chris Tensmeyer, Brian Davis, Curtis Wigington, Iain Lee, Bill Barrett

Figure 1 for PageNet: Page Boundary Extraction in Historical Handwritten Documents
Figure 2 for PageNet: Page Boundary Extraction in Historical Handwritten Documents
Figure 3 for PageNet: Page Boundary Extraction in Historical Handwritten Documents
Figure 4 for PageNet: Page Boundary Extraction in Historical Handwritten Documents

When digitizing a document into an image, it is common to include a surrounding border region to visually indicate that the entire document is present in the image. However, this border should be removed prior to automated processing. In this work, we present a deep learning based system, PageNet, which identifies the main page region in an image in order to segment content from both textual and non-textual border noise. In PageNet, a Fully Convolutional Network obtains a pixel-wise segmentation which is post-processed into the output quadrilateral region. We evaluate PageNet on 4 collections of historical handwritten documents and obtain over 94% mean intersection over union on all datasets and approach human performance on 2 of these collections. Additionally, we show that PageNet can segment documents that are overlayed on top of other documents.

* HIP 2017 (in submission) 
Viaarxiv icon

Convolutional Neural Networks for Font Classification

Aug 11, 2017
Chris Tensmeyer, Daniel Saunders, Tony Martinez

Figure 1 for Convolutional Neural Networks for Font Classification
Figure 2 for Convolutional Neural Networks for Font Classification
Figure 3 for Convolutional Neural Networks for Font Classification
Figure 4 for Convolutional Neural Networks for Font Classification

Classifying pages or text lines into font categories aids transcription because single font Optical Character Recognition (OCR) is generally more accurate than omni-font OCR. We present a simple framework based on Convolutional Neural Networks (CNNs), where a CNN is trained to classify small patches of text into predefined font classes. To classify page or line images, we average the CNN predictions over densely extracted patches. We show that this method achieves state-of-the-art performance on a challenging dataset of 40 Arabic computer fonts with 98.8\% line level accuracy. This same method also achieves the highest reported accuracy of 86.6% in predicting paleographic scribal script classes at the page level on medieval Latin manuscripts. Finally, we analyze what features are learned by the CNN on Latin manuscripts and find evidence that the CNN is learning both the defining morphological differences between scribal script classes as well as overfitting to class-correlated nuisance factors. We propose a novel form of data augmentation that improves robustness to text darkness, further increasing classification performance.

* ICDAR 2017 
Viaarxiv icon

Document Image Binarization with Fully Convolutional Neural Networks

Aug 10, 2017
Chris Tensmeyer, Tony Martinez

Figure 1 for Document Image Binarization with Fully Convolutional Neural Networks
Figure 2 for Document Image Binarization with Fully Convolutional Neural Networks
Figure 3 for Document Image Binarization with Fully Convolutional Neural Networks
Figure 4 for Document Image Binarization with Fully Convolutional Neural Networks

Binarization of degraded historical manuscript images is an important pre-processing step for many document processing tasks. We formulate binarization as a pixel classification learning task and apply a novel Fully Convolutional Network (FCN) architecture that operates at multiple image scales, including full resolution. The FCN is trained to optimize a continuous version of the Pseudo F-measure metric and an ensemble of FCNs outperform the competition winners on 4 of 7 DIBCO competitions. This same binarization technique can also be applied to different domains such as Palm Leaf Manuscripts with good performance. We analyze the performance of the proposed model w.r.t. the architectural hyperparameters, size and diversity of training data, and the input features chosen.

* ICDAR 2017 (oral) 
Viaarxiv icon

Analysis of Convolutional Neural Networks for Document Image Classification

Aug 10, 2017
Chris Tensmeyer, Tony Martinez

Figure 1 for Analysis of Convolutional Neural Networks for Document Image Classification
Figure 2 for Analysis of Convolutional Neural Networks for Document Image Classification
Figure 3 for Analysis of Convolutional Neural Networks for Document Image Classification
Figure 4 for Analysis of Convolutional Neural Networks for Document Image Classification

Convolutional Neural Networks (CNNs) are state-of-the-art models for document image classification tasks. However, many of these approaches rely on parameters and architectures designed for classifying natural images, which differ from document images. We question whether this is appropriate and conduct a large empirical study to find what aspects of CNNs most affect performance on document images. Among other results, we exceed the state-of-the-art on the RVL-CDIP dataset by using shear transform data augmentation and an architecture designed for a larger input image. Additionally, we analyze the learned features and find evidence that CNNs trained on RVL-CDIP learn region-specific layout features.

* Accepted ICDAR 2017 
Viaarxiv icon