Alert button
Picture for Paul Michel

Paul Michel

Alert button

Weight Poisoning Attacks on Pre-trained Models

Apr 14, 2020
Keita Kurita, Paul Michel, Graham Neubig

Figure 1 for Weight Poisoning Attacks on Pre-trained Models
Figure 2 for Weight Poisoning Attacks on Pre-trained Models
Figure 3 for Weight Poisoning Attacks on Pre-trained Models
Figure 4 for Weight Poisoning Attacks on Pre-trained Models

Recently, NLP has seen a surge in the usage of large pre-trained models. Users download weights of models pre-trained on large datasets, then fine-tune the weights on a task of their choice. This raises the question of whether downloading untrusted pre-trained weights can pose a security threat. In this paper, we show that it is possible to construct ``weight poisoning'' attacks where pre-trained weights are injected with vulnerabilities that expose ``backdoors'' after fine-tuning, enabling the attacker to manipulate the model prediction simply by injecting an arbitrary keyword. We show that by applying a regularization method, which we call RIPPLe, and an initialization procedure, which we call Embedding Surgery, such attacks are possible even with limited knowledge of the dataset and fine-tuning procedure. Our experiments on sentiment classification, toxicity detection, and spam detection show that this attack is widely applicable and poses a serious threat. Finally, we outline practical defenses against such attacks. Code to reproduce our experiments is available at https://github.com/neulab/RIPPLe.

* Published as a long paper at ACL 2020 
Viaarxiv icon

Optimizing Data Usage via Differentiable Rewards

Nov 22, 2019
Xinyi Wang, Hieu Pham, Paul Michel, Antonios Anastasopoulos, Graham Neubig, Jaime Carbonell

Figure 1 for Optimizing Data Usage via Differentiable Rewards
Figure 2 for Optimizing Data Usage via Differentiable Rewards
Figure 3 for Optimizing Data Usage via Differentiable Rewards
Figure 4 for Optimizing Data Usage via Differentiable Rewards

To acquire a new skill, humans learn better and faster if a tutor, based on their current knowledge level, informs them of how much attention they should pay to particular content or practice problems. Similarly, a machine learning model could potentially be trained better with a scorer that "adapts" to its current learning state and estimates the importance of each training data instance. Training such an adaptive scorer efficiently is a challenging problem; in order to precisely quantify the effect of a data instance at a given time during the training, it is typically necessary to first complete the entire training process. To efficiently optimize data usage, we propose a reinforcement learning approach called Differentiable Data Selection (DDS). In DDS, we formulate a scorer network as a learnable function of the training data, which can be efficiently updated along with the main model being trained. Specifically, DDS updates the scorer with an intuitive reward signal: it should up-weigh the data that has a similar gradient with a dev set upon which we would finally like to perform well. Without significant computing overhead, DDS delivers strong and consistent improvements over several strong baselines on two very different tasks of machine translation and image classification.

Viaarxiv icon

Findings of the First Shared Task on Machine Translation Robustness

Jul 03, 2019
Xian Li, Paul Michel, Antonios Anastasopoulos, Yonatan Belinkov, Nadir Durrani, Orhan Firat, Philipp Koehn, Graham Neubig, Juan Pino, Hassan Sajjad

Figure 1 for Findings of the First Shared Task on Machine Translation Robustness
Figure 2 for Findings of the First Shared Task on Machine Translation Robustness
Figure 3 for Findings of the First Shared Task on Machine Translation Robustness
Figure 4 for Findings of the First Shared Task on Machine Translation Robustness

We share the findings of the first shared task on improving robustness of Machine Translation (MT). The task provides a testbed representing challenges facing MT models deployed in the real world, and facilitates new approaches to improve models; robustness to noisy input and domain mismatch. We focus on two language pairs (English-French and English-Japanese), and the submitted systems are evaluated on a blind test set consisting of noisy comments on Reddit and professionally sourced translations. As a new task, we received 23 submissions by 11 participating teams from universities, companies, national labs, etc. All submitted systems achieved large improvements over baselines, with the best improvement having +22.33 BLEU. We evaluated submissions by both human judgment and automatic evaluation (BLEU), which shows high correlations (Pearson's r = 0.94 and 0.95). Furthermore, we conducted a qualitative analysis of the submitted systems using compare-mt, which revealed their salient differences in handling challenges in this task. Such analysis provides additional insights when there is occasional disagreement between human judgment and BLEU, e.g. systems better at producing colloquial expressions received higher score from human judgment.

Viaarxiv icon

Are Sixteen Heads Really Better than One?

May 25, 2019
Paul Michel, Omer Levy, Graham Neubig

Figure 1 for Are Sixteen Heads Really Better than One?
Figure 2 for Are Sixteen Heads Really Better than One?
Figure 3 for Are Sixteen Heads Really Better than One?
Figure 4 for Are Sixteen Heads Really Better than One?

Attention is a powerful and ubiquitous mechanism for allowing neural models to focus on particular salient pieces of information by taking their weighted average when making predictions. In particular, multi-headed attention is a driving force behind many recent state-of-the-art NLP models such as Transformer-based MT models and BERT. These models apply multiple attention mechanisms in parallel, with each attention "head" potentially focusing on different parts of the input, which makes it possible to express sophisticated functions beyond the simple weighted average. In this paper we make the surprising observation that even if models have been trained using multiple heads, in practice, a large percentage of attention heads can be removed at test time without significantly impacting performance. In fact, some layers can even be reduced to a single head. We further examine greedy algorithms for pruning down models, and the potential speed, memory efficiency, and accuracy improvements obtainable therefrom. Finally, we analyze the results with respect to which parts of the model are more reliant on having multiple heads, and provide precursory evidence that training dynamics play a role in the gains provided by multi-head attention.

Viaarxiv icon

compare-mt: A Tool for Holistic Comparison of Language Generation Systems

Mar 19, 2019
Graham Neubig, Zi-Yi Dou, Junjie Hu, Paul Michel, Danish Pruthi, Xinyi Wang

Figure 1 for compare-mt: A Tool for Holistic Comparison of Language Generation Systems
Figure 2 for compare-mt: A Tool for Holistic Comparison of Language Generation Systems
Figure 3 for compare-mt: A Tool for Holistic Comparison of Language Generation Systems
Figure 4 for compare-mt: A Tool for Holistic Comparison of Language Generation Systems

In this paper, we describe compare-mt, a tool for holistic analysis and comparison of the results of systems for language generation tasks such as machine translation. The main goal of the tool is to give the user a high-level and coherent view of the salient differences between systems that can then be used to guide further analysis or system improvement. It implements a number of tools to do so, such as analysis of accuracy of generation of particular types of words, bucketed histograms of sentence accuracies or counts based on salient characteristics, and extraction of characteristic $n$-grams for each system. It also has a number of advanced features such as use of linguistic labels, source side data, or comparison of log likelihoods for probabilistic models, and also aims to be easily extensible by users to new types of analysis. The code is available at https://github.com/neulab/compare-mt

* NAACL 2019 Demo Paper 
Viaarxiv icon

On Evaluation of Adversarial Perturbations for Sequence-to-Sequence Models

Mar 19, 2019
Paul Michel, Xian Li, Graham Neubig, Juan Miguel Pino

Figure 1 for On Evaluation of Adversarial Perturbations for Sequence-to-Sequence Models
Figure 2 for On Evaluation of Adversarial Perturbations for Sequence-to-Sequence Models
Figure 3 for On Evaluation of Adversarial Perturbations for Sequence-to-Sequence Models
Figure 4 for On Evaluation of Adversarial Perturbations for Sequence-to-Sequence Models

Adversarial examples --- perturbations to the input of a model that elicit large changes in the output --- have been shown to be an effective way of assessing the robustness of sequence-to-sequence (seq2seq) models. However, these perturbations only indicate weaknesses in the model if they do not change the input so significantly that it legitimately results in changes in the expected output. This fact has largely been ignored in the evaluations of the growing body of related literature. Using the example of untargeted attacks on machine translation (MT), we propose a new evaluation framework for adversarial attacks on seq2seq models that takes the semantic equivalence of the pre- and post-perturbation input into account. Using this framework, we demonstrate that existing methods may not preserve meaning in general, breaking the aforementioned assumption that source side perturbations should not result in changes in the expected output. We further use this framework to demonstrate that adding additional constraints on attacks allows for adversarial perturbations that are more meaning-preserving, but nonetheless largely change the output sequence. Finally, we show that performing untargeted adversarial training with meaning-preserving attacks is beneficial to the model in terms of adversarial robustness, without hurting test performance. A toolkit implementing our evaluation framework is released at https://github.com/pmichel31415/teapot-nlp.

* NAACL-HLT 2019 long paper 
Viaarxiv icon

MTNT: A Testbed for Machine Translation of Noisy Text

Sep 02, 2018
Paul Michel, Graham Neubig

Figure 1 for MTNT: A Testbed for Machine Translation of Noisy Text

Noisy or non-standard input text can cause disastrous mistranslations in most modern Machine Translation (MT) systems, and there has been growing research interest in creating noise-robust MT systems. However, as of yet there are no publicly available parallel corpora of with naturally occurring noisy inputs and translations, and thus previous work has resorted to evaluating on synthetically created datasets. In this paper, we propose a benchmark dataset for Machine Translation of Noisy Text (MTNT), consisting of noisy comments on Reddit (www.reddit.com) and professionally sourced translations. We commissioned translations of English comments into French and Japanese, as well as French and Japanese comments into English, on the order of 7k-37k sentences per language pair. We qualitatively and quantitatively examine the types of noise included in this dataset, then demonstrate that existing MT models fail badly on a number of noise-related phenomena, even after performing adaptation on a small training set of in-domain data. This indicates that this dataset can provide an attractive testbed for methods tailored to handling noisy text in MT. The data is publicly available at www.cs.cmu.edu/~pmichel1/mtnt/.

* EMNLP 2018 Long Paper 
Viaarxiv icon

Extreme Adaptation for Personalized Neural Machine Translation

May 04, 2018
Paul Michel, Graham Neubig

Figure 1 for Extreme Adaptation for Personalized Neural Machine Translation
Figure 2 for Extreme Adaptation for Personalized Neural Machine Translation
Figure 3 for Extreme Adaptation for Personalized Neural Machine Translation
Figure 4 for Extreme Adaptation for Personalized Neural Machine Translation

Every person speaks or writes their own flavor of their native language, influenced by a number of factors: the content they tend to talk about, their gender, their social status, or their geographical origin. When attempting to perform Machine Translation (MT), these variations have a significant effect on how the system should perform translation, but this is not captured well by standard one-size-fits-all models. In this paper, we propose a simple and parameter-efficient adaptation technique that only requires adapting the bias of the output softmax to each particular user of the MT system, either directly or through a factored approximation. Experiments on TED talks in three languages demonstrate improvements in translation accuracy, and better reflection of speaker traits in the target text.

* Accepted as a short paper at ACL 2018 
Viaarxiv icon

Does the Geometry of Word Embeddings Help Document Classification? A Case Study on Persistent Homology Based Representations

May 31, 2017
Paul Michel, Abhilasha Ravichander, Shruti Rijhwani

Figure 1 for Does the Geometry of Word Embeddings Help Document Classification? A Case Study on Persistent Homology Based Representations
Figure 2 for Does the Geometry of Word Embeddings Help Document Classification? A Case Study on Persistent Homology Based Representations
Figure 3 for Does the Geometry of Word Embeddings Help Document Classification? A Case Study on Persistent Homology Based Representations
Figure 4 for Does the Geometry of Word Embeddings Help Document Classification? A Case Study on Persistent Homology Based Representations

We investigate the pertinence of methods from algebraic topology for text data analysis. These methods enable the development of mathematically-principled isometric-invariant mappings from a set of vectors to a document embedding, which is stable with respect to the geometry of the document in the selected metric space. In this work, we evaluate the utility of these topology-based document representations in traditional NLP tasks, specifically document clustering and sentiment classification. We find that the embeddings do not benefit text analysis. In fact, performance is worse than simple techniques like $\textit{tf-idf}$, indicating that the geometry of the document does not provide enough variability for classification on the basis of topic or sentiment in the chosen datasets.

* 5 pages, 3 figures. Rep4NLP workshop at ACL 2017 
Viaarxiv icon