Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ana Paula G. S. de Almeida

Sequence-aware multimodal page classification of Brazilian legal documents

Jul 15, 2022

Pedro H. Luz de Araujo, Ana Paula G. S. de Almeida, Fabricio A. Braz, Nilton C. da Silva, Flavio de Barros Vidal, Teofilo E. de Campos

Figure 1 for Sequence-aware multimodal page classification of Brazilian legal documents

Figure 2 for Sequence-aware multimodal page classification of Brazilian legal documents

Figure 3 for Sequence-aware multimodal page classification of Brazilian legal documents

Figure 4 for Sequence-aware multimodal page classification of Brazilian legal documents

Abstract:The Brazilian Supreme Court receives tens of thousands of cases each semester. Court employees spend thousands of hours to execute the initial analysis and classification of those cases -- which takes effort away from posterior, more complex stages of the case management workflow. In this paper, we explore multimodal classification of documents from Brazil's Supreme Court. We train and evaluate our methods on a novel multimodal dataset of 6,510 lawsuits (339,478 pages) with manual annotation assigning each page to one of six classes. Each lawsuit is an ordered sequence of pages, which are stored both as an image and as a corresponding text extracted through optical character recognition. We first train two unimodal classifiers: a ResNet pre-trained on ImageNet is fine-tuned on the images, and a convolutional network with filters of multiple kernel sizes is trained from scratch on document texts. We use them as extractors of visual and textual features, which are then combined through our proposed Fusion Module. Our Fusion Module can handle missing textual or visual input by using learned embeddings for missing data. Moreover, we experiment with bi-directional Long Short-Term Memory (biLSTM) networks and linear-chain conditional random fields to model the sequential nature of the pages. The multimodal approaches outperform both textual and visual classifiers, especially when leveraging the sequential nature of the pages.

* International Journal on Document Analysis and Recognition.2022
* 11 pages, 6 figures. This preprint, which was originally written on 8 April 2021, has not undergone peer review or any post-submission improvements or corrections. The Version of Record of this article is published in the International Journal on Document Analysis and Recognition, and is available online at https://doi.org/10.1007/s10032-022-00406-7 and https://rdcu.be/cRvvV

Via

Access Paper or Ask Questions

Turning old models fashion again: Recycling classical CNN networks using the Lattice Transformation

Sep 28, 2021

Ana Paula G. S. de Almeida, Flavio de Barros Vidal

Figure 1 for Turning old models fashion again: Recycling classical CNN networks using the Lattice Transformation

Figure 2 for Turning old models fashion again: Recycling classical CNN networks using the Lattice Transformation

Figure 3 for Turning old models fashion again: Recycling classical CNN networks using the Lattice Transformation

Figure 4 for Turning old models fashion again: Recycling classical CNN networks using the Lattice Transformation

Abstract:In the early 1990s, the first signs of life of the CNN era were given: LeCun et al. proposed a CNN model trained by the backpropagation algorithm to classify low-resolution images of handwritten digits. Undoubtedly, it was a breakthrough in the field of computer vision. But with the rise of other classification methods, it fell out fashion. That was until 2012, when Krizhevsky et al. revived the interest in CNNs by exhibiting considerably higher image classification accuracy on the ImageNet challenge. Since then, the complexity of the architectures are exponentially increasing and many structures are rapidly becoming obsolete. Using multistream networks as a base and the feature infusion precept, we explore the proposed LCNN cross-fusion strategy to use the backbones of former state-of-the-art networks on image classification in order to discover if the technique is able to put these designs back in the game. In this paper, we showed that we can obtain an increase of accuracy up to 63.21% on the NORB dataset we comparing with the original structure. However, no technique is definitive. While our goal is to try to reuse previous state-of-the-art architectures with few modifications, we also expose the disadvantages of our explored strategy.

* 21 pages, 13 figures

Via

Access Paper or Ask Questions

L-CNN: A Lattice cross-fusion strategy for multistream convolutional neural networks

Aug 01, 2020

Ana Paula G. S. de Almeida, Flavio de Barros Vidal

Figure 1 for L-CNN: A Lattice cross-fusion strategy for multistream convolutional neural networks

Figure 2 for L-CNN: A Lattice cross-fusion strategy for multistream convolutional neural networks

Figure 3 for L-CNN: A Lattice cross-fusion strategy for multistream convolutional neural networks

Figure 4 for L-CNN: A Lattice cross-fusion strategy for multistream convolutional neural networks

Abstract:This paper proposes a fusion strategy for multistream convolutional networks, the Lattice Cross Fusion. This approach crosses signals from convolution layers performing mathematical operation-based fusions right before pooling layers. Results on a purposely worsened CIFAR-10, a popular image classification data set, with a modified AlexNet-LCNN version show that this novel method outperforms by 46% the baseline single stream network, with faster convergence, stability, and robustness.

* Electronics Letters, vol. 55, no. 22, pp. 1180-1182, 2029
* 5 pages, 3 figures

Via

Access Paper or Ask Questions