Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Aliaksei Severyn

Encode, Tag, Realize: High-Precision Text Editing

Sep 03, 2019
Eric Malmi, Sebastian Krause, Sascha Rothe, Daniil Mirylenka, Aliaksei Severyn

Figure 1 for Encode, Tag, Realize: High-Precision Text Editing

Figure 2 for Encode, Tag, Realize: High-Precision Text Editing

Figure 3 for Encode, Tag, Realize: High-Precision Text Editing

Figure 4 for Encode, Tag, Realize: High-Precision Text Editing

We propose LaserTagger - a sequence tagging approach that casts text generation as a text editing task. Target texts are reconstructed from the inputs using three main edit operations: keeping a token, deleting it, and adding a phrase before the token. To predict the edit operations, we propose a novel model, which combines a BERT encoder with an autoregressive Transformer decoder. This approach is evaluated on English text on four tasks: sentence fusion, sentence splitting, abstractive summarization, and grammar correction. LaserTagger achieves new state-of-the-art results on three of these tasks, performs comparably to a set of strong seq2seq baselines with a large number of training examples, and outperforms them when the number of examples is limited. Furthermore, we show that at inference time tagging can be more than two orders of magnitude faster than comparable seq2seq models, making it more attractive for running in a live environment.

* EMNLP 2019

Via

Access Paper or Ask Questions

Leveraging Pre-trained Checkpoints for Sequence Generation Tasks

Jul 29, 2019
Sascha Rothe, Shashi Narayan, Aliaksei Severyn

Figure 1 for Leveraging Pre-trained Checkpoints for Sequence Generation Tasks

Figure 2 for Leveraging Pre-trained Checkpoints for Sequence Generation Tasks

Figure 3 for Leveraging Pre-trained Checkpoints for Sequence Generation Tasks

Figure 4 for Leveraging Pre-trained Checkpoints for Sequence Generation Tasks

Unsupervised pre-training of large neural models has recently revolutionized Natural Language Processing. Warm-starting from the publicly released checkpoints, NLP practitioners have pushed the state-of-the-art on multiple benchmarks while saving significant amounts of compute time. So far the focus has been mainly on the Natural Language Understanding tasks. In this paper, we present an extensive empirical study on the utility of initializing large Transformer-based sequence-to-sequence models with the publicly available pre-trained BERT and GPT-2 checkpoints for sequence generation. We have run over 300 experiments spending thousands of TPU hours to find the recipe that works best and demonstrate that it results in new state-of-the-art results on Machine Translation, Summarization, Sentence Splitting and Sentence Fusion.

Via

Access Paper or Ask Questions

Breaking the Softmax Bottleneck via Learnable Monotonic Pointwise Non-linearities

Feb 21, 2019
Octavian-Eugen Ganea, Sylvain Gelly, Gary Bécigneul, Aliaksei Severyn

Figure 1 for Breaking the Softmax Bottleneck via Learnable Monotonic Pointwise Non-linearities

Figure 2 for Breaking the Softmax Bottleneck via Learnable Monotonic Pointwise Non-linearities

Figure 3 for Breaking the Softmax Bottleneck via Learnable Monotonic Pointwise Non-linearities

Figure 4 for Breaking the Softmax Bottleneck via Learnable Monotonic Pointwise Non-linearities

The softmax function on top of a final linear layer is the de facto method to output probability distributions in neural networks. In many applications such as language models or text generation, this model has to produce distributions over large output vocabularies. Recently, this has been shown to have limited representational capacity due to its connection with the rank bottleneck in matrix factorization. However, little is known about the limitations of linear-softmax for quantities of practical interest such as cross entropy or mode estimation, a direction that we theoretically and empirically explore here. As an efficient and effective solution to alleviate this issue, we propose to learn parametric monotonic functions on top of the logits. We theoretically investigate the rank increasing capabilities of such monotonic functions. Empirically, our method improves in two different quality metrics over the traditional softmax-linear layer in synthetic and real language model experiments, adding little time or memory overhead, while being comparable to the more computationally expensive mixture of softmaxes.

Via

Access Paper or Ask Questions

Eval all, trust a few, do wrong to none: Comparing sentence generation models

Oct 30, 2018
Ondřej Cífka, Aliaksei Severyn, Enrique Alfonseca, Katja Filippova

Figure 1 for Eval all, trust a few, do wrong to none: Comparing sentence generation models

Figure 2 for Eval all, trust a few, do wrong to none: Comparing sentence generation models

Figure 3 for Eval all, trust a few, do wrong to none: Comparing sentence generation models

Figure 4 for Eval all, trust a few, do wrong to none: Comparing sentence generation models

In this paper, we study recent neural generative models for text generation related to variational autoencoders. Previous works have employed various techniques to control the prior distribution of the latent codes in these models, which is important for sampling performance, but little attention has been paid to reconstruction error. In our study, we follow a rigorous evaluation protocol using a large set of previously used and novel automatic and human evaluation metrics, applied to both generated samples and reconstructions. We hope that it will become the new evaluation standard when comparing neural generative models for text.

* 12 pages (3 page appendix); v2: added hyperparameter settings, clarifications

Via

Access Paper or Ask Questions

Adversarial Neural Networks for Cross-lingual Sequence Tagging

Aug 14, 2018
Heike Adel, Anton Bryl, David Weiss, Aliaksei Severyn

Figure 1 for Adversarial Neural Networks for Cross-lingual Sequence Tagging

Figure 2 for Adversarial Neural Networks for Cross-lingual Sequence Tagging

Figure 3 for Adversarial Neural Networks for Cross-lingual Sequence Tagging

Figure 4 for Adversarial Neural Networks for Cross-lingual Sequence Tagging

We study cross-lingual sequence tagging with little or no labeled data in the target language. Adversarial training has previously been shown to be effective for training cross-lingual sentence classifiers. However, it is not clear if language-agnostic representations enforced by an adversarial language discriminator will also enable effective transfer for token-level prediction tasks. Therefore, we experiment with different types of adversarial training on two tasks: dependency parsing and sentence compression. We show that adversarial training consistently leads to improved cross-lingual performance on each task compared to a conventionally trained baseline.

Via

Access Paper or Ask Questions

On Accurate Evaluation of GANs for Language Generation

Jun 14, 2018
Stanislau Semeniuta, Aliaksei Severyn, Sylvain Gelly

Figure 1 for On Accurate Evaluation of GANs for Language Generation

Figure 2 for On Accurate Evaluation of GANs for Language Generation

Figure 3 for On Accurate Evaluation of GANs for Language Generation

Figure 4 for On Accurate Evaluation of GANs for Language Generation

Generative Adversarial Networks (GANs) are a promising approach to language generation. The latest works introducing novel GAN models for language generation use n-gram based metrics for evaluation and only report single scores of the best run. In this paper, we argue that this often misrepresents the true picture and does not tell the full story, as GAN models can be extremely sensitive to the random initialization and small deviations from the best hyperparameter choice. In particular, we demonstrate that the previously used BLEU score is not sensitive to semantic deterioration of generated texts and propose alternative metrics that better capture the quality and diversity of the generated samples. We also conduct a set of experiments comparing a number of GAN models for text with a conventional Language Model (LM) and find that neither of the considered models performs convincingly better than the LM.

Via

Access Paper or Ask Questions

Prosody Modifications for Question-Answering in Voice-Only Settings

Jun 11, 2018
Aleksandr Chuklin, Aliaksei Severyn, Johanne Trippas, Enrique Alfonseca, Hanna Silen, Damiano Spina

Figure 1 for Prosody Modifications for Question-Answering in Voice-Only Settings

Figure 2 for Prosody Modifications for Question-Answering in Voice-Only Settings

Figure 3 for Prosody Modifications for Question-Answering in Voice-Only Settings

Figure 4 for Prosody Modifications for Question-Answering in Voice-Only Settings

Many popular form factors of digital assistant---such as Amazon Echo, Apple Homepod or Google Home---enable the user to hold a conversation with the assistant based only on the speech modality. The lack of a screen from which the user can read text or watch supporting images or video presents unique challenges. In order to satisfy the information need of a user, we believe that the presentation of the answer needs to be optimized for such voice-only interactions. In this paper we propose a task of evaluating usefulness of prosody modifications for the purpose of voice-only question answering. We describe a crowd-sourcing setup where we evaluate the quality of these modifications along multiple dimensions corresponding to the informativeness, naturalness, and ability of the user to identify the key part of the answer. In addition, we propose a set of simple prosodic modifications that highlight important parts of the answer using various acoustic cues.

Via

Access Paper or Ask Questions

Avoiding Your Teacher's Mistakes: Training Neural Networks with Controlled Weak Supervision

Dec 07, 2017
Mostafa Dehghani, Aliaksei Severyn, Sascha Rothe, Jaap Kamps

Figure 1 for Avoiding Your Teacher's Mistakes: Training Neural Networks with Controlled Weak Supervision

Figure 2 for Avoiding Your Teacher's Mistakes: Training Neural Networks with Controlled Weak Supervision

Figure 3 for Avoiding Your Teacher's Mistakes: Training Neural Networks with Controlled Weak Supervision

Figure 4 for Avoiding Your Teacher's Mistakes: Training Neural Networks with Controlled Weak Supervision

Training deep neural networks requires massive amounts of training data, but for many tasks only limited labeled data is available. This makes weak supervision attractive, using weak or noisy signals like the output of heuristic methods or user click-through data for training. In a semi-supervised setting, we can use a large set of data with weak labels to pretrain a neural network and then fine-tune the parameters with a small amount of data with true labels. This feels intuitively sub-optimal as these two independent stages leave the model unaware about the varying label quality. What if we could somehow inform the model about the label quality? In this paper, we propose a semi-supervised learning method where we train two neural networks in a multi-task fashion: a "target network" and a "confidence network". The target network is optimized to perform a given task and is trained using a large set of unlabeled data that are weakly annotated. We propose to weight the gradient updates to the target network using the scores provided by the second confidence network, which is trained on a small amount of supervised data. Thus we avoid that the weight updates computed from noisy labels harm the quality of the target network model. We evaluate our learning strategy on two different tasks: document ranking and sentiment classification. The results demonstrate that our approach not only enhances the performance compared to the baselines but also speeds up the learning process from weak labels.

Via

Access Paper or Ask Questions

Learning to Learn from Weak Supervision by Full Supervision

Nov 30, 2017
Mostafa Dehghani, Aliaksei Severyn, Sascha Rothe, Jaap Kamps

Figure 1 for Learning to Learn from Weak Supervision by Full Supervision

Figure 2 for Learning to Learn from Weak Supervision by Full Supervision

Figure 3 for Learning to Learn from Weak Supervision by Full Supervision

In this paper, we propose a method for training neural networks when we have a large set of data with weak labels and a small amount of data with true labels. In our proposed model, we train two neural networks: a target network, the learner and a confidence network, the meta-learner. The target network is optimized to perform a given task and is trained using a large set of unlabeled data that are weakly annotated. We propose to control the magnitude of the gradient updates to the target network using the scores provided by the second confidence network, which is trained on a small amount of supervised data. Thus we avoid that the weight updates computed from noisy labels harm the quality of the target network model.

* Accepted at NIPS Workshop on Meta-Learning (MetaLearn 2017), Long Beach, CA, USA

Via

Access Paper or Ask Questions

Neural Ranking Models with Weak Supervision

May 29, 2017
Mostafa Dehghani, Hamed Zamani, Aliaksei Severyn, Jaap Kamps, W. Bruce Croft

Figure 1 for Neural Ranking Models with Weak Supervision

Figure 2 for Neural Ranking Models with Weak Supervision

Figure 3 for Neural Ranking Models with Weak Supervision

Figure 4 for Neural Ranking Models with Weak Supervision

Despite the impressive improvements achieved by unsupervised deep neural networks in computer vision and NLP tasks, such improvements have not yet been observed in ranking for information retrieval. The reason may be the complexity of the ranking problem, as it is not obvious how to learn from queries and documents when no supervised signal is available. Hence, in this paper, we propose to train a neural ranking model using weak supervision, where labels are obtained automatically without human annotators or any external resources (e.g., click data). To this aim, we use the output of an unsupervised ranking model, such as BM25, as a weak supervision signal. We further train a set of simple yet effective ranking models based on feed-forward neural networks. We study their effectiveness under various learning scenarios (point-wise and pair-wise models) and using different input representations (i.e., from encoding query-document pairs into dense/sparse vectors to using word embedding representation). We train our networks using tens of millions of training instances and evaluate it on two standard collections: a homogeneous news collection(Robust) and a heterogeneous large-scale web collection (ClueWeb). Our experiments indicate that employing proper objective functions and letting the networks to learn the input representation based on weakly supervised data leads to impressive performance, with over 13% and 35% MAP improvements over the BM25 model on the Robust and the ClueWeb collections. Our findings also suggest that supervised neural ranking models can greatly benefit from pre-training on large amounts of weakly labeled data that can be easily obtained from unsupervised IR models.

* In proceedings of The 40th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR2017)

Via

Access Paper or Ask Questions