Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

"Text": models, code, and papers

Adversarial Training of End-to-end Speech Recognition Using a Criticizing Language Model

Nov 02, 2018
Alexander H. Liu, Hung-yi Lee, Lin-shan Lee

In this paper we proposed a novel Adversarial Training (AT) approach for end-to-end speech recognition using a Criticizing Language Model (CLM). In this way the CLM and the automatic speech recognition (ASR) model can challenge and learn from each other iteratively to improve the performance. Since the CLM only takes the text as input, huge quantities of unpaired text data can be utilized in this approach within end-to-end training. Moreover, AT can be applied to any end-to-end ASR model using any deep-learning-based language modeling frameworks, and compatible with any existing end-to-end decoding method. Initial results with an example experimental setup demonstrated the proposed approach is able to gain consistent improvements efficiently from auxiliary text data under different scenarios.

* under review ICASSP 2019 

  Access Paper or Ask Questions

Parsing Speech: A Neural Approach to Integrating Lexical and Acoustic-Prosodic Information

Apr 15, 2018
Trang Tran, Shubham Toshniwal, Mohit Bansal, Kevin Gimpel, Karen Livescu, Mari Ostendorf

In conversational speech, the acoustic signal provides cues that help listeners disambiguate difficult parses. For automatically parsing spoken utterances, we introduce a model that integrates transcribed text and acoustic-prosodic features using a convolutional neural network over energy and pitch trajectories coupled with an attention-based recurrent neural network that accepts text and prosodic features. We find that different types of acoustic-prosodic features are individually helpful, and together give statistically significant improvements in parse and disfluency detection F1 scores over a strong text-only baseline. For this study with known sentence boundaries, error analyses show that the main benefit of acoustic-prosodic features is in sentences with disfluencies, attachment decisions are most improved, and transcription errors obscure gains from prosody.

* Accepted in NAACL HLT 2018 

  Access Paper or Ask Questions

A Digital Corpus of St. Lawrence Island Yupik

Jan 26, 2021
Lane Schwartz, Emily Chen, Hyunji Hayley Park, Edward Jahn, Sylvia L. R. Schreiner

St. Lawrence Island Yupik (ISO 639-3: ess) is an endangered polysynthetic language in the Inuit-Yupik language family indigenous to Alaska and Chukotka. This work presents a step-by-step pipeline for the digitization of written texts, and the first publicly available digital corpus for St. Lawrence Island Yupik, created using that pipeline. This corpus has great potential for future linguistic inquiry and research in NLP. It was also developed for use in Yupik language education and revitalization, with a primary goal of enabling easy access to Yupik texts by educators and by members of the Yupik community. A secondary goal is to support development of language technology such as spell-checkers, text-completion systems, interactive e-books, and language learning apps for use by the Yupik community.

* ComputEL-4 

  Access Paper or Ask Questions

SemEval-2020 Task 11: Detection of Propaganda Techniques in News Articles

Sep 06, 2020
G. Da San Martino, A. Barrón-Cedeño, H. Wachsmuth, R. Petrov, P. Nakov

We present the results and the main findings of SemEval-2020 Task 11 on Detection of Propaganda Techniques in News Articles. The task featured two subtasks. Subtask SI is about Span Identification: given a plain-text document, spot the specific text fragments containing propaganda. Subtask TC is about Technique Classification: given a specific text fragment, in the context of a full document, determine the propaganda technique it uses, choosing from an inventory of 14 possible propaganda techniques. The task attracted a large number of participants: 250 teams signed up to participate and 44 made a submission on the test set. In this paper, we present the task, analyze the results, and discuss the system submissions and the methods they used. For both subtasks, the best systems used pre-trained Transformers and ensembles.

* 37 pages, to be published in Proceedings of the 14th International Workshop on Semantic Evaluation 

  Access Paper or Ask Questions

FrameAxis: Characterizing Framing Bias and Intensity with Word Embedding

Feb 22, 2020
Haewoon Kwak, Jisun An, Yong-Yeol Ahn

We propose FrameAxis, a method of characterizing the framing of a given text by identifying the most relevant semantic axes ("microframes") defined by antonym word pairs. In contrast to the traditional framing analysis, which has been constrained by a small number of manually annotated general frames, our unsupervised approach provides much more detailed insights, by considering a host of semantic axes. Our method is capable of quantitatively teasing out framing bias -- how biased a text is in each microframe -- and framing intensity -- how much each microframe is used -- from the text, offering a nuanced characterization of framing. We evaluate our approach using SemEval datasets as well as three other datasets and human evaluations, demonstrating that FrameAxis can reliably characterize documents with relevant microframes. Our method may allow scalable and nuanced computational analyses of framing across disciplines.

* 8 pages; typos corrected (email address) 

  Access Paper or Ask Questions

BP-Transformer: Modelling Long-Range Context via Binary Partitioning

Nov 11, 2019
Zihao Ye, Qipeng Guo, Quan Gan, Xipeng Qiu, Zheng Zhang

The Transformer model is widely successful on many natural language processing tasks. However, the quadratic complexity of self-attention limit its application on long text. In this paper, adopting a fine-to-coarse attention mechanism on multi-scale spans via binary partitioning (BP), we propose BP-Transformer (BPT for short). BPT yields $O(k\cdot n\log (n/k))$ connections where $k$ is a hyperparameter to control the density of attention. BPT has a good balance between computation complexity and model capacity. A series of experiments on text classification, machine translation and language modeling shows BPT has a superior performance for long text than previous self-attention models. Our code, hyperparameters and CUDA kernels for sparse attention are available in PyTorch.

  Access Paper or Ask Questions

Probing Contextualized Sentence Representations with Visual Awareness

Nov 07, 2019
Zhuosheng Zhang, Rui Wang, Kehai Chen, Masao Utiyama, Eiichiro Sumita, Hai Zhao

We present a universal framework to model contextualized sentence representations with visual awareness that is motivated to overcome the shortcomings of the multimodal parallel data with manual annotations. For each sentence, we first retrieve a diversity of images from a shared cross-modal embedding space, which is pre-trained on a large-scale of text-image pairs. Then, the texts and images are respectively encoded by transformer encoder and convolutional neural network. The two sequences of representations are further fused by a simple and effective attention layer. The architecture can be easily applied to text-only natural language processing tasks without manually annotating multimodal parallel corpora. We apply the proposed method on three tasks, including neural machine translation, natural language inference and sequence labeling and experimental results verify the effectiveness.

  Access Paper or Ask Questions

Using Local Knowledge Graph Construction to Scale Seq2Seq Models to Multi-Document Inputs

Oct 18, 2019
Angela Fan, Claire Gardent, Chloe Braud, Antoine Bordes

Query-based open-domain NLP tasks require information synthesis from long and diverse web results. Current approaches extractively select portions of web text as input to Sequence-to-Sequence models using methods such as TF-IDF ranking. We propose constructing a local graph structured knowledge base for each query, which compresses the web search information and reduces redundancy. We show that by linearizing the graph into a structured input sequence, models can encode the graph representations within a standard Sequence-to-Sequence setting. For two generative tasks with very long text input, long-form question answering and multi-document summarization, feeding graph representations as input can achieve better performance than using retrieved text portions.

  Access Paper or Ask Questions

The Language of Legal and Illegal Activity on the Darknet

Jun 04, 2019
Leshem Choshen, Dan Eldad, Daniel Hershcovich, Elior Sulem, Omri Abend

The non-indexed parts of the Internet (the Darknet) have become a haven for both legal and illegal anonymous activity. Given the magnitude of these networks, scalably monitoring their activity necessarily relies on automated tools, and notably on NLP tools. However, little is known about what characteristics texts communicated through the Darknet have, and how well off-the-shelf NLP tools do on this domain. This paper tackles this gap and performs an in-depth investigation of the characteristics of legal and illegal text in the Darknet, comparing it to a clear net website with similar content as a control condition. Taking drug-related websites as a test case, we find that texts for selling legal and illegal drugs have several linguistic characteristics that distinguish them from one another, as well as from the control condition, among them the distribution of POS tags, and the coverage of their named entities in Wikipedia.

* ACL 2019 camera ready; code in 

  Access Paper or Ask Questions