Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

"Text": models, code, and papers

PnPOOD : Out-Of-Distribution Detection for Text Classification via Plug andPlay Data Augmentation

Oct 31, 2021
Mrinal Rawat, Ramya Hebbalaguppe, Lovekesh Vig

While Out-of-distribution (OOD) detection has been well explored in computer vision, there have been relatively few prior attempts in OOD detection for NLP classification. In this paper we argue that these prior attempts do not fully address the OOD problem and may suffer from data leakage and poor calibration of the resulting models. We present PnPOOD, a data augmentation technique to perform OOD detection via out-of-domain sample generation using the recently proposed Plug and Play Language Model (Dathathri et al., 2020). Our method generates high quality discriminative samples close to the class boundaries, resulting in accurate OOD detection at test time. We demonstrate that our model outperforms prior models on OOD sample detection, and exhibits lower calibration error on the 20 newsgroup text and Stanford Sentiment Treebank dataset (Lang, 1995; Socheret al., 2013). We further highlight an important data leakage issue with datasets used in prior attempts at OOD detection, and share results on a new dataset for OOD detection that does not suffer from the same problem.

  Access Paper or Ask Questions

A Survey on Machine Learning Techniques for Auto Labeling of Video, Audio, and Text Data

Sep 08, 2021
Shikun Zhang, Omid Jafari, Parth Nagarkar

Machine learning has been utilized to perform tasks in many different domains such as classification, object detection, image segmentation and natural language analysis. Data labeling has always been one of the most important tasks in machine learning. However, labeling large amounts of data increases the monetary cost in machine learning. As a result, researchers started to focus on reducing data annotation and labeling costs. Transfer learning was designed and widely used as an efficient approach that can reasonably reduce the negative impact of limited data, which in turn, reduces the data preparation cost. Even transferring previous knowledge from a source domain reduces the amount of data needed in a target domain. However, large amounts of annotated data are still demanded to build robust models and improve the prediction accuracy of the model. Therefore, researchers started to pay more attention on auto annotation and labeling. In this survey paper, we provide a review of previous techniques that focuses on optimized data annotation and labeling for video, audio, and text data.

  Access Paper or Ask Questions

Toward Collaborative Reinforcement Learning Agents that Communicate Through Text-Based Natural Language

Jul 21, 2021
Kevin Eloff, Herman A. Engelbrecht

Communication between agents in collaborative multi-agent settings is in general implicit or a direct data stream. This paper considers text-based natural language as a novel form of communication between multiple agents trained with reinforcement learning. This could be considered first steps toward a truly autonomous communication without the need to define a limited set of instructions, and natural collaboration between humans and robots. Inspired by the game of Blind Leads, we propose an environment where one agent uses natural language instructions to guide another through a maze. We test the ability of reinforcement learning agents to effectively communicate through discrete word-level symbols and show that the agents are able to sufficiently communicate through natural language with a limited vocabulary. Although the communication is not always perfect English, the agents are still able to navigate the maze. We achieve a BLEU score of 0.85, which is an improvement of 0.61 over randomly generated sequences while maintaining a 100% maze completion rate. This is a 3.5 times the performance of the random baseline using our reference set.

* In: 2021 Southern African Universities Power Engineering Conference/Robotics and Mechatronics/Pattern Recognition Association of South Africa (SAUPEC/RobMech/PRASA), 2021, pp. 1-6 
* 5 pages, 6 figures, 3 tables; published in 2021 Southern African Universities Power Engineering Conference/Robotics and Mechatronics/Pattern Recognition Association of South Africa (SAUPEC/RobMech/PRASA), 2021, pp. 1-6, (c) 2021 IEEE 

  Access Paper or Ask Questions

A Character-Level Approach to the Text Normalization Problem Based on a New Causal Encoder

Mar 06, 2019
Adrián Javaloy Bornás, Ginés García Mateos

Text normalization is a ubiquitous process that appears as the first step of many Natural Language Processing problems. However, previous Deep Learning approaches have suffered from so-called silly errors, which are undetectable on unsupervised frameworks, making those models unsuitable for deployment. In this work, we make use of an attention-based encoder-decoder architecture that overcomes these undetectable errors by using a fine-grained character-level approach rather than a word-level one. Furthermore, our new general-purpose encoder based on causal convolutions, called Causal Feature Extractor (CFE), is introduced and compared to other common encoders. The experimental results show the feasibility of this encoder, which leverages the attention mechanisms the most and obtains better results in terms of accuracy, number of parameters and convergence time. While our method results in a slightly worse initial accuracy (92.74%), errors can be automatically detected and, thus, more readily solved, obtaining a more robust model for deployment. Furthermore, there is still plenty of room for future improvements that will push even further these advantages.

* 19 pages, 14 figures, journal 

  Access Paper or Ask Questions

Statistical Texture Features based Handwritten and Printed Text Classification in South Indian Documents

Mar 13, 2013
Mallikarjun Hangarge, K. C. Santosh, Srikanth Doddamani, Rajmohan Pardeshi

In this paper, we use statistical texture features for handwritten and printed text classification. We primarily aim for word level classification in south Indian scripts. Words are first extracted from the scanned document. For each extracted word, statistical texture features are computed such as mean, standard deviation, smoothness, moment, uniformity, entropy and local range including local entropy. These feature vectors are then used to classify words via k-NN classifier. We have validated the approach over several different datasets. Scripts like Kannada, Telugu, Malayalam and Hindi i.e., Devanagari are primarily employed where an average classification rate of 99.26% is achieved. In addition, to provide an extensibility of the approach, we address Roman script by using publicly available dataset and interesting results are reported.

* Appeared in ICECIT-2102 

  Access Paper or Ask Questions

Meta-TTS: Meta-Learning for Few-Shot Speaker Adaptive Text-to-Speech

Nov 07, 2021
Sung-Feng Huang, Chyi-Jiunn Lin, Hung-yi Lee

Personalizing a speech synthesis system is a highly desired application, where the system can generate speech with the user's voice with rare enrolled recordings. There are two main approaches to build such a system in recent works: speaker adaptation and speaker encoding. On the one hand, speaker adaptation methods fine-tune a trained multi-speaker text-to-speech (TTS) model with few enrolled samples. However, they require at least thousands of fine-tuning steps for high-quality adaptation, making it hard to apply on devices. On the other hand, speaker encoding methods encode enrollment utterances into a speaker embedding. The trained TTS model can synthesize the user's speech conditioned on the corresponding speaker embedding. Nevertheless, the speaker encoder suffers from the generalization gap between the seen and unseen speakers. In this paper, we propose applying a meta-learning algorithm to the speaker adaptation method. More specifically, we use Model Agnostic Meta-Learning (MAML) as the training algorithm of a multi-speaker TTS model, which aims to find a great meta-initialization to adapt the model to any few-shot speaker adaptation tasks quickly. Therefore, we can also adapt the meta-trained TTS model to unseen speakers efficiently. Our experiments compare the proposed method (Meta-TTS) with two baselines: a speaker adaptation method baseline and a speaker encoding method baseline. The evaluation results show that Meta-TTS can synthesize high speaker-similarity speech from few enrollment samples with fewer adaptation steps than the speaker adaptation baseline and outperforms the speaker encoding baseline under the same training scheme. When the speaker encoder of the baseline is pre-trained with extra 8371 speakers of data, Meta-TTS can still outperform the baseline on LibriTTS dataset and achieve comparable results on VCTK dataset.

* under review 

  Access Paper or Ask Questions

Spatial Pyramid Encoding with Convex Length Normalization for Text-Independent Speaker Verification

Jun 19, 2019
Youngmoon Jung, Younggwan Kim, Hyungjun Lim, Yeunju Choi, Hoirin Kim

In this paper, we propose a new pooling method called spatial pyramid encoding (SPE) to generate speaker embeddings for text-independent speaker verification. We first partition the output feature maps from a deep residual network (ResNet) into increasingly fine sub-regions and extract speaker embeddings from each sub-region through a learnable dictionary encoding layer. These embeddings are concatenated to obtain the final speaker representation. The SPE layer not only generates a fixed-dimensional speaker embedding for a variable-length speech segment, but also aggregates the information of feature distribution from multi-level temporal bins. Furthermore, we apply deep length normalization by augmenting the loss function with ring loss. By applying ring loss, the network gradually learns to normalize the speaker embeddings using model weights themselves while preserving convexity, leading to more robust speaker embeddings. Experiments on the VoxCeleb1 dataset show that the proposed system using the SPE layer and ring loss-based deep length normalization outperforms both i-vector and d-vector baselines.

* 5 pages, 2 figures, Interspeech 2019 

  Access Paper or Ask Questions

Local Citation Recommendation with Hierarchical-Attention Text Encoder and SciBERT-based Reranking

Dec 02, 2021
Nianlong Gu, Yingqiang Gao, Richard H. R. Hahnloser

The goal of local citation recommendation is to recommend a missing reference from the local citation context and optionally also from the global context. To balance the tradeoff between speed and accuracy of citation recommendation in the context of a large-scale paper database, a viable approach is to first prefetch a limited number of relevant documents using efficient ranking methods and then to perform a fine-grained reranking using more sophisticated models. In that vein, BM25 has been found to be a tough-to-beat approach to prefetching, which is why recent work has focused mainly on the reranking step. Even so, we explore prefetching with nearest neighbor search among text embeddings constructed by a hierarchical attention network. When coupled with a SciBERT reranker fine-tuned on local citation recommendation tasks, our hierarchical Attention encoder (HAtten) achieves high prefetch recall for a given number of candidates to be reranked. Consequently, our reranker needs to rerank fewer prefetch candidates, yet still achieves state-of-the-art performance on various local citation recommendation datasets such as ACL-200, FullTextPeerRead, RefSeer, and arXiv.

* Accepted by ECIR 2022: 

  Access Paper or Ask Questions

Ontology-Based Query Expansion with Latently Related Named Entities for Semantic Text Search

Jul 15, 2018
Vuong M. Ngo, Tru H. Cao

Traditional information retrieval systems represent documents and queries by keyword sets. However, the content of a document or a query is mainly defined by both keywords and named entities occurring in it. Named entities have ontological features, namely, their aliases, classes, and identifiers, which are hidden from their textual appearance. Besides, the meaning of a query may imply latent named entities that are related to the apparent ones in the query. We propose an ontology-based generalized vector space model to semantic text search. It exploits ontological features of named entities and their latently related ones to reveal the semantics of documents and queries. We also propose a framework to combine different ontologies to take their complementary advantages for semantic annotation and searching. Experiments on a benchmark dataset show better search quality of our model to other ones.

* 12 pages - accepted by Advances in Intelligent Information and Database Systems, Book of series SCI, Springer-Verlag (2010) 

  Access Paper or Ask Questions