Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

"Text": models, code, and papers

On-Device Document Classification using multimodal features

Jan 06, 2021
Sugam Garg, Harichandana, Sumit Kumar

From small screenshots to large videos, documents take up a bulk of space in a modern smartphone. Documents in a phone can accumulate from various sources, and with the high storage capacity of mobiles, hundreds of documents are accumulated in a short period. However, searching or managing documents remains an onerous task, since most search methods depend on meta-information or only text in a document. In this paper, we showcase that a single modality is insufficient for classification and present a novel pipeline to classify documents on-device, thus preventing any private user data transfer to server. For this task, we integrate an open-source library for Optical Character Recognition (OCR) and our novel model architecture in the pipeline. We optimise the model for size, a necessary metric for on-device inference. We benchmark our classification model with a standard multimodal dataset FOOD-101 and showcase competitive results with the previous State of the Art with 30% model compression.

* 8th ACM IKDD CODS and 26th COMAD 2-4 January 2021 

  Access Paper or Ask Questions

Prefix-Tuning: Optimizing Continuous Prompts for Generation

Jan 01, 2021
Xiang Lisa Li, Percy Liang

Fine-tuning is the de facto way to leverage large pretrained language models to perform downstream tasks. However, it modifies all the language model parameters and therefore necessitates storing a full copy for each task. In this paper, we propose prefix-tuning, a lightweight alternative to fine-tuning for natural language generation tasks, which keeps language model parameters frozen, but optimizes a small continuous task-specific vector (called the prefix). Prefix-tuning draws inspiration from prompting, allowing subsequent tokens to attend to this prefix as if it were "virtual tokens". We apply prefix-tuning to GPT-2 for table-to-text generation and to BART for summarization. We find that by learning only 0.1\% of the parameters, prefix-tuning obtains comparable performance in the full data setting, outperforms fine-tuning in low-data settings, and extrapolates better to examples with topics unseen during training.

  Access Paper or Ask Questions

Nonnegative Matrix Factorization with Zellner Penalty

Dec 07, 2020
Matthew Corsetti, Ernest Fokoué

Nonnegative matrix factorization (NMF) is a relatively new unsupervised learning algorithm that decomposes a nonnegative data matrix into a parts-based, lower dimensional, linear representation of the data. NMF has applications in image processing, text mining, recommendation systems and a variety of other fields. Since its inception, the NMF algorithm has been modified and explored by numerous authors. One such modification involves the addition of auxiliary constraints to the objective function of the factorization. The purpose of these auxiliary constraints is to impose task-specific penalties or restrictions on the objective function. Though many auxiliary constraints have been studied, none have made use of data-dependent penalties. In this paper, we propose Zellner nonnegative matrix factorization (ZNMF), which uses data-dependent auxiliary constraints. We assess the facial recognition performance of the ZNMF algorithm and several other well-known constrained NMF algorithms using the Cambridge ORL database.

* Open Journal of Statistics 5 (2015) 777-786 
* 10 pages, 4 figures, 2 tables 

  Access Paper or Ask Questions

Event Guided Denoising for Multilingual Relation Learning

Dec 04, 2020
Amith Ananthram, Emily Allaway, Kathleen McKeown

General purpose relation extraction has recently seen considerable gains in part due to a massively data-intensive distant supervision technique from Soares et al. (2019) that produces state-of-the-art results across many benchmarks. In this work, we present a methodology for collecting high quality training data for relation extraction from unlabeled text that achieves a near-recreation of their zero-shot and few-shot results at a fraction of the training cost. Our approach exploits the predictable distributional structure of date-marked news articles to build a denoised corpus -- the extraction process filters out low quality examples. We show that a smaller multilingual encoder trained on this corpus performs comparably to the current state-of-the-art (when both receive little to no fine-tuning) on few-shot and standard relation benchmarks in English and Spanish despite using many fewer examples (50k vs. 300mil+).

* COLING2020, short paper 

  Access Paper or Ask Questions

CUED_speech at TREC 2020 Podcast Summarisation Track

Dec 04, 2020
Potsawee Manakul, Mark Gales

In this paper, we describe our approach for the Podcast Summarisation challenge in TREC 2020. Given a podcast episode with its transcription, the goal is to generate a summary that captures the most important information in the content. Our approach consists of two steps: (1) Filtering redundant or less informative sentences in the transcription using the attention of a hierarchical model; (2) Applying a state-of-the-art text summarisation system (BART) fine-tuned on the Podcast data using a sequence-level reward function. Furthermore, we perform ensembles of three and nine models for our submission runs. We also fine-tune the BART model on the Podcast data as our baseline. The human evaluation by NIST shows that our best submission achieves 1.777 in the EGFB scale, while the score of creator-provided description is 1.291. Our system won the Spotify Podcast Summarisation Challenge in the TREC2020 Podcast Track in both human and automatic evaluation.

* TREC 2020 

  Access Paper or Ask Questions

ReMix: Calibrated Resampling for Class Imbalance in Deep learning

Dec 03, 2020
Colin Bellinger, Roberto Corizzo, Nathalie Japkowicz

Class imbalance is a problem of significant importance in applied deep learning where trained models are exploited for decision support and automated decisions in critical areas such as health and medicine, transportation, and finance. The challenge of learning deep models from imbalanced training data remains high, and the state-of-the-art solutions are typically data dependent and primarily focused on image data. Real-world imbalanced classification problems, however, are much more diverse thus necessitating a general solution that can be applied to tabular, image and text data. In this paper, we propose ReMix, a training technique that leverages batch resampling, instance mixing and soft-labels to enable the induction of robust deep models for imbalanced learning. Our results show that dense nets and CNNs trained with ReMix generally outperform the alternatives according to the g-mean and are better calibrated according to the balanced Brier score.

  Access Paper or Ask Questions

The Geometry of Distributed Representations for Better Alignment, Attenuated Bias, and Improved Interpretability

Nov 25, 2020
Sunipa Dev

High-dimensional representations for words, text, images, knowledge graphs and other structured data are commonly used in different paradigms of machine learning and data mining. These representations have different degrees of interpretability, with efficient distributed representations coming at the cost of the loss of feature to dimension mapping. This implies that there is obfuscation in the way concepts are captured in these embedding spaces. Its effects are seen in many representations and tasks, one particularly problematic one being in language representations where the societal biases, learned from underlying data, are captured and occluded in unknown dimensions and subspaces. As a result, invalid associations (such as different races and their association with a polar notion of good versus bad) are made and propagated by the representations, leading to unfair outcomes in different tasks where they are used. This work addresses some of these problems pertaining to the transparency and interpretability of such representations. A primary focus is the detection, quantification, and mitigation of socially biased associations in language representation.

* PhD thesis, University of Utah (2020) 

  Access Paper or Ask Questions

Data Augmentation for End-to-end Code-switching Speech Recognition

Nov 04, 2020
Chenpeng Du, Hao Li, Yizhou Lu, Lan Wang, Yanmin Qian

Training a code-switching end-to-end automatic speech recognition (ASR) model normally requires a large amount of data, while code-switching data is often limited. In this paper, three novel approaches are proposed for code-switching data augmentation. Specifically, they are audio splicing with the existing code-switching data, and TTS with new code-switching texts generated by word translation or word insertion. Our experiments on 200 hours Mandarin-English code-switching dataset show that all the three proposed approaches yield significant improvements on code-switching ASR individually. Moreover, all the proposed approaches can be combined with recent popular SpecAugment, and an addition gain can be obtained. WER is significantly reduced by relative 24.0% compared to the system without any data augmentation, and still relative 13.0% gain compared to the system with only SpecAugment

* Accepted by SLT2021 

  Access Paper or Ask Questions

Multitask Learning and Joint Optimization for Transformer-RNN-Transducer Speech Recognition

Nov 02, 2020
Jae-Jin Jeon, Eesung Kim

Recently, several types of end-to-end speech recognition methods named transformer-transducer were introduced. According to those kinds of methods, transcription networks are generally modeled by transformer-based neural networks, while prediction networks could be modeled by either transformers or recurrent neural networks (RNN). This paper explores multitask learning, joint optimization, and joint decoding methods for transformer-RNN-transducer systems. Our proposed methods have the main advantage in that the model can maintain information on the large text corpus. We prove their effectiveness by performing experiments utilizing the well-known ESPNET toolkit for the widely used Librispeech datasets. We also show that the proposed methods can reduce word error rate (WER) by 16.6 % and 13.3 % for test-clean and test-other datasets, respectively, without changing the overall model structure nor exploiting an external LM.

  Access Paper or Ask Questions

Towards End-to-End Training of Automatic Speech Recognition for Nigerian Pidgin

Oct 21, 2020
Daniel Ajisafe, Oluwabukola Adegboro, Esther Oduntan, Tayo Arulogun

Nigerian Pidgin remains one of the most popular languages in West Africa. With at least 75 million speakers along the West African coast, the language has spread to diasporic communities through Nigerian immigrants in England, Canada, and America, amongst others. In contrast, the language remains an under-resourced one in the field of natural language processing, particularly on speech recognition and translation tasks. In this work, we present the first parallel (speech-to-text) data on Nigerian pidgin. We also trained the first end-to-end speech recognition system (QuartzNet and Jasper model) on this language which were both optimized using Connectionist Temporal Classification (CTC) loss. With baseline results, we were able to achieve a low word error rate (WER) of 0.77% using a greedy decoder on our dataset. Finally, we open-source the data and code along with this publication in order to encourage future research in this direction.

* To appear in ICASSP 2021 

  Access Paper or Ask Questions