Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

"Text": models, code, and papers

Detection of Lexical Stress Errors in Non-native (L2) English with Data Augmentation and Attention

Dec 29, 2020
Daniel Korzekwa, Roberto Barra-Chicote, Szymon Zaporowski, Grzegorz Beringer, Jaime Lorenzo-Trueba, Alicja Serafinowicz, Jasha Droppo, Thomas Drugman, Bozena Kostek

This paper describes two novel complementary techniques that improve the detection of lexical stress errors in non-native (L2) English speech: attention-based feature extraction and data augmentation based on Neural Text-To-Speech (TTS). In a classical approach, audio features are usually extracted from fixed regions of speech such as syllable nucleus. We propose an attention-based deep learning model that automatically derives optimal syllable-level representation from frame-level and phoneme-level audio features. Training this model is challenging because of the limited amount of incorrect stress patterns. To solve this problem, we propose to augment the training set with incorrectly stressed words generated with Neural TTS. Combining both techniques achieves 94.8\% precision and 49.2\% recall for the detection of incorrectly stressed words in L2 English speech of Slavic speakers.

* Submitted to ICASSP 2021 

  Access Paper or Ask Questions

Predicting Online Video Advertising Effects with Multimodal Deep Learning

Dec 22, 2020
Jun Ikeda, Hiroyuki Seshime, Xueting Wang, Toshihiko Yamasaki

With expansion of the video advertising market, research to predict the effects of video advertising is getting more attention. Although effect prediction of image advertising has been explored a lot, prediction for video advertising is still challenging with seldom research. In this research, we propose a method for predicting the click through rate (CTR) of video advertisements and analyzing the factors that determine the CTR. In this paper, we demonstrate an optimized framework for accurately predicting the effects by taking advantage of the multimodal nature of online video advertisements including video, text, and metadata features. In particular, the two types of metadata, i.e., categorical and continuous, are properly separated and normalized. To avoid overfitting, which is crucial in our task because the training data are not very rich, additional regularization layers are inserted. Experimental results show that our approach can achieve a correlation coefficient as high as 0.695, which is a significant improvement from the baseline (0.487).

* Accepted at International Conference on Pattern Recognition 2020 (ICPR) 

  Access Paper or Ask Questions

Knowledge Distillation for BERT Unsupervised Domain Adaptation

Oct 23, 2020
Minho Ryu, Kichun Lee

A pre-trained language model, BERT, has brought significant performance improvements across a range of natural language processing tasks. Since the model is trained on a large corpus of diverse topics, it shows robust performance for domain shift problems in which data distributions at training (source data) and testing (target data) differ while sharing similarities. Despite its great improvements compared to previous models, it still suffers from performance degradation due to domain shifts. To mitigate such problems, we propose a simple but effective unsupervised domain adaptation method, adversarial adaptation with distillation (AAD), which combines the adversarial discriminative domain adaptation (ADDA) framework with knowledge distillation. We evaluate our approach in the task of cross-domain sentiment classification on 30 domain pairs, advancing the state-of-the-art performance for unsupervised domain adaptation in text sentiment classification.

  Access Paper or Ask Questions

Generalized Conditioned Dialogue Generation Based on Pre-trained Language Model

Oct 21, 2020
Yan Zeng, Jian-Yun Nie

We investigate the general problem of conditioned dialogue, in which a condition label is used as input to designate the type of the target response such as a persona. A major challenge for conditioned dialogue generation is the lack of substantial dialogue data labeled with conditions. Thus, we propose to complement the labeled dialogue data with labeled non-dialogue text data, and fine-tune BERT based on them. Our fine-tuning approach utilizes BERT for both encoder and decoder via different input representations and self-attention masks in order to distinguish the source and target side. On the target (generation) side, we use a new attention routing mechanism to choose between generating a generic word or condition-related word at each position. Our model is instantiated to persona- and topic-related dialogue. Experimental results in both cases show that our approach can produce significantly better responses than the state-of-the-art baselines.

* 9 pages, 2 figures 

  Access Paper or Ask Questions

Fantastic Features and Where to Find Them: Detecting Cognitive Impairment with a Subsequence Classification Guided Approach

Oct 13, 2020
Benjamin Eyre, Aparna Balagopalan, Jekaterina Novikova

Despite the widely reported success of embedding-based machine learning methods on natural language processing tasks, the use of more easily interpreted engineered features remains common in fields such as cognitive impairment (CI) detection. Manually engineering features from noisy text is time and resource consuming, and can potentially result in features that do not enhance model performance. To combat this, we describe a new approach to feature engineering that leverages sequential machine learning models and domain knowledge to predict which features help enhance performance. We provide a concrete example of this method on a standard data set of CI speech and demonstrate that CI classification accuracy improves by 2.3% over a strong baseline when using features produced by this method. This demonstration provides an ex-ample of how this method can be used to assist classification in fields where interpretability is important, such as health care.

* EMNLP Workshop on Noisy User-generated Text (W-NUT 2020) 

  Access Paper or Ask Questions

On Long-Tailed Phenomena in Neural Machine Translation

Oct 10, 2020
Vikas Raunak, Siddharth Dalmia, Vivek Gupta, Florian Metze

State-of-the-art Neural Machine Translation (NMT) models struggle with generating low-frequency tokens, tackling which remains a major challenge. The analysis of long-tailed phenomena in the context of structured prediction tasks is further hindered by the added complexities of search during inference. In this work, we quantitatively characterize such long-tailed phenomena at two levels of abstraction, namely, token classification and sequence generation. We propose a new loss function, the Anti-Focal loss, to better adapt model training to the structural dependencies of conditional text generation by incorporating the inductive biases of beam search in the training process. We show the efficacy of the proposed technique on a number of Machine Translation (MT) datasets, demonstrating that it leads to significant gains over cross-entropy across different language pairs, especially on the generation of low-frequency words. We have released the code to reproduce our results.

* Accepted to Findings of EMNLP 2020 

  Access Paper or Ask Questions

SupMMD: A Sentence Importance Model for Extractive Summarization using Maximum Mean Discrepancy

Oct 06, 2020
Umanga Bista, Alexander Patrick Mathews, Aditya Krishna Menon, Lexing Xie

Most work on multi-document summarization has focused on generic summarization of information present in each individual document set. However, the under-explored setting of update summarization, where the goal is to identify the new information present in each set, is of equal practical interest (e.g., presenting readers with updates on an evolving news topic). In this work, we present SupMMD, a novel technique for generic and update summarization based on the maximum mean discrepancy from kernel two-sample testing. SupMMD combines both supervised learning for salience and unsupervised learning for coverage and diversity. Further, we adapt multiple kernel learning to make use of similarity across multiple information sources (e.g., text features and knowledge based concepts). We show the efficacy of SupMMD in both generic and update summarization tasks by meeting or exceeding the current state-of-the-art on the DUC-2004 and TAC-2009 datasets.

* EMNLP 2020 
* 15 pages 

  Access Paper or Ask Questions

Constrained Labeling for Weakly Supervised Learning

Sep 15, 2020
Chidubem Arachie, Bert Huang

Curation of large fully supervised datasets has become one of the major roadblocks for machine learning. Weak supervision provides an alternative to supervised learning by training with cheap, noisy, and possibly correlated labeling functions from varying sources. The key challenge in weakly supervised learning is combining the different weak supervision signals while navigating misleading correlations in their errors. In this paper, we propose a simple data-free approach for combining weak supervision signals by defining a constrained space for the possible labels of the weak signals and training with a random labeling within this constrained space. Our method is efficient and stable, converging after a few iterations of gradient descent. We prove theoretical conditions under which the worst-case error of the randomized label decreases with the rank of the linear constraints. We show experimentally that our method outperforms other weak supervision methods on various text- and image-classification tasks.

  Access Paper or Ask Questions

Open-Edit: Open-Domain Image Manipulation with Open-Vocabulary Instructions

Aug 04, 2020
Xihui Liu, Zhe Lin, Jianming Zhang, Handong Zhao, Quan Tran, Xiaogang Wang, Hongsheng Li

We propose a novel algorithm, named Open-Edit, which is the first attempt on open-domain image manipulation with open-vocabulary instructions. It is a challenging task considering the large variation of image domains and the lack of training supervision. Our approach takes advantage of the unified visual-semantic embedding space pretrained on a general image-caption dataset, and manipulates the embedded visual features by applying text-guided vector arithmetic on the image feature maps. A structure-preserving image decoder then generates the manipulated images from the manipulated feature maps. We further propose an on-the-fly sample-specific optimization approach with cycle-consistency constraints to regularize the manipulated images and force them to preserve details of the source images. Our approach shows promising results in manipulating open-vocabulary color, texture, and high-level attributes for various scenarios of open-domain images.

* To appear on ECCV 2020. Introduction video at and code at 

  Access Paper or Ask Questions

Including Images into Message Veracity Assessment in Social Media

Jul 20, 2020
Abderrazek Azri, Cécile Favre, Nouria Harbi, Jérôme Darmont

The extensive use of social media in the diffusion of information has also laid a fertile ground for the spread of rumors, which could significantly affect the credibility of social media. An ever-increasing number of users post news including, in addition to text, multimedia data such as images and videos. Yet, such multimedia content is easily editable due to the broad availability of simple and effective image and video processing tools. The problem of assessing the veracity of social network posts has attracted a lot of attention from researchers in recent years. However, almost all previous works have focused on analyzing textual contents to determine veracity, while visual contents, and more particularly images, remains ignored or little exploited in the literature. In this position paper, we propose a framework that explores two novel ways to assess the veracity of messages published on social networks by analyzing the credibility of both their textual and visual contents.

* 8th International Conference on Innovation and New Trends in Information Technology (INTIS 2019), Dec 2019, Tangier, Morocco 

  Access Paper or Ask Questions