Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

"Text": models, code, and papers

Audio Adversarial Examples: Attacks Using Vocal Masks

Feb 04, 2021
Lynnette Ng, Kai Yuan Tay, Wei Han Chua, Lucerne Loke, Danqi Ye, Melissa Chua

We construct audio adversarial examples on automatic Speech-To-Text systems . Given any audio waveform, we produce an another by overlaying an audio vocal mask generated from the original audio. We apply our audio adversarial attack to five SOTA STT systems: DeepSpeech, Julius, Kaldi, [email protected] and CMUSphinx. In addition, we engaged human annotators to transcribe the adversarial audio. Our experiments show that these adversarial examples fool State-Of-The-Art Speech-To-Text systems, yet humans are able to consistently pick out the speech. The feasibility of this attack introduces a new domain to study machine and human perception of speech.

* 9 pages, 1 figure, 2 tables. Submitted to COLING2020 

  Access Paper or Ask Questions

Generating Wikipedia Article Sections from Diverse Data Sources

Dec 29, 2020
Mingda Chen, Sam Wiseman, Kevin Gimpel

Datasets for data-to-text generation typically focus either on multi-domain, single-sentence generation or on single-domain, long-form generation. In this work, we create a large-scale dataset, WikiTableT, that pairs Wikipedia sections with their corresponding tabular data and various metadata. WikiTableT contains millions of instances, covering a broad range of topics, as well as a variety of flavors of generation tasks with different levels of flexibility. We benchmark several training and decoding strategies on WikiTableT. Our qualitative analysis shows that the best approaches can generate fluent and high quality texts but they sometimes struggle with coherence.

  Access Paper or Ask Questions

Non-Linear Multiple Field Interactions Neural Document Ranking

Nov 18, 2020
Kentaro Takiguchi, Niall Twomey, Luis M. Vaquero

Ranking tasks are usually based on the text of the main body of the page and the actions (clicks) of users on the page. There are other elements that could be leveraged to better contextualise the ranking experience (e.g. text in other fields, query made by the user, images, etc). We present one of the first in-depth analyses of field interaction for multiple field ranking in two separate datasets. While some works have taken advantage of full document structure, some aspects remain unexplored. In this work we build on previous analyses to show how query-field interactions, non-linear field interactions, and the architecture of the underlying neural model affect performance.

  Access Paper or Ask Questions

Show and Speak: Directly Synthesize Spoken Description of Images

Oct 23, 2020
Xinsheng Wang, Siyuan Feng, Jihua Zhu, Mark Hasegawa-Johnson, Odette Scharenborg

This paper proposes a new model, referred to as the show and speak (SAS) model that, for the first time, is able to directly synthesize spoken descriptions of images, bypassing the need for any text or phonemes. The basic structure of SAS is an encoder-decoder architecture that takes an image as input and predicts the spectrogram of speech that describes this image. The final speech audio is obtained from the predicted spectrogram via WaveNet. Extensive experiments on the public benchmark database Flickr8k demonstrate that the proposed SAS is able to synthesize natural spoken descriptions for images, indicating that synthesizing spoken descriptions for images while bypassing text and phonemes is feasible.

  Access Paper or Ask Questions

Self-Supervised Representation Learning on Document Images

Apr 18, 2020
Adrian Cosma, Mihai Ghidoveanu, Michael Panaitescu-Liess, Marius Popescu

This work analyses the impact of self-supervised pre-training on document images. While previous approaches explore the effect of self-supervision on natural images, we show that patch-based pre-training performs poorly on text document images because of their different structural properties and poor intra-sample semantic information. We propose two context-aware alternatives to improve performance. We also propose a novel method for self-supervision, which makes use of the inherent multi-modality of documents (image and text), which performs better than other popular self-supervised methods, including supervised ImageNet pre-training.

* 15 pages, 5 figures. Accepted at DAS 2020: IAPR International Workshop on Document Analysis Systems 

  Access Paper or Ask Questions

Inexpensive Domain Adaptation of Pretrained Language Models: A Case Study on Biomedical Named Entity Recognition

Apr 07, 2020
Nina Poerner, Ulli Waltinger, Hinrich Schütze

Domain adaptation of Pretrained Language Models (PTLMs) is typically achieved by pretraining on in-domain text. While successful, this approach is expensive in terms of hardware, runtime and CO_2 emissions. Here, we propose a cheaper alternative: We train Word2Vec on in-domain text and align the resulting word vectors with the input space of a general-domain PTLM (here: BERT). We evaluate on eight biomedical Named Entity Recognition (NER) tasks and compare against the recently proposed BioBERT model (Lee et al., 2020). We cover over 50% of the BioBERT-BERT F1 delta, at 5% of BioBERT's CO_2 footprint and 2% of its cloud compute cost.

  Access Paper or Ask Questions

Aligning Multilingual Word Embeddings for Cross-Modal Retrieval Task

Oct 08, 2019
Alireza Mohammadshahi, Remi Lebret, Karl Aberer

In this paper, we propose a new approach to learn multimodal multilingual embeddings for matching images and their relevant captions in two languages. We combine two existing objective functions to make images and captions close in a joint embedding space while adapting the alignment of word embeddings between existing languages in our model. We show that our approach enables better generalization, achieving state-of-the-art performance in text-to-image and image-to-text retrieval task, and caption-caption similarity task. Two multimodal multilingual datasets are used for evaluation: Multi30k with German and English captions and Microsoft-COCO with English and Japanese captions.

  Access Paper or Ask Questions

Guess who? Multilingual approach for the automated generation of author-stylized poetry

Sep 17, 2018
Alexey Tikhonov, Ivan P. Yamshchikov

This paper addresses the problem of stylized text generation in a multilingual setup. A version of a language model based on a long short-term memory (LSTM) artificial neural network with extended phonetic and semantic embeddings is used for stylized poetry generation. The quality of the resulting poems generated by the network is estimated through bilingual evaluation understudy (BLEU), a survey and a new cross-entropy based metric that is suggested for the problems of such type. The experiments show that the proposed model consistently outperforms random sample and vanilla-LSTM baselines, humans also tend to associate machine generated texts with the target author.

  Access Paper or Ask Questions

End-to-End Automatic Speech Translation of Audiobooks

Feb 12, 2018
Alexandre Bérard, Laurent Besacier, Ali Can Kocabiyikoglu, Olivier Pietquin

We investigate end-to-end speech-to-text translation on a corpus of audiobooks specifically augmented for this task. Previous works investigated the extreme case where source language transcription is not available during learning nor decoding, but we also study a midway case where source language transcription is available at training time only. In this case, a single model is trained to decode source speech into target text in a single pass. Experimental results show that it is possible to train compact and efficient end-to-end speech translation models in this setup. We also distribute the corpus and hope that our speech translation baseline on this corpus will be challenged in the future.

* Accepted to ICASSP 2018 (poster presentation) 

  Access Paper or Ask Questions

Convolutional Neural Networks for Sentiment Classification on Business Reviews

Oct 16, 2017
Andreea Salinca

Recently Convolutional Neural Networks (CNNs) models have proven remarkable results for text classification and sentiment analysis. In this paper, we present our approach on the task of classifying business reviews using word embeddings on a large-scale dataset provided by Yelp: Yelp 2017 challenge dataset. We compare word-based CNN using several pre-trained word embeddings and end-to-end vector representations for text reviews classification. We conduct several experiments to capture the semantic relationship between business reviews and we use deep learning techniques that prove that the obtained results are competitive with traditional methods.

* Proceedings of IJCAI Workshop on Semantic Machine Learning (SML 2017): 35-39 
* Published in Proceedings of IJCAI Workshop on Semantic Machine Learning, 5 pages 

  Access Paper or Ask Questions