Alert button
Picture for Eva Hasler

Eva Hasler

Alert button

Analyzing the Use of Influence Functions for Instance-Specific Data Filtering in Neural Machine Translation

Oct 24, 2022
Tsz Kin Lam, Eva Hasler, Felix Hieber

Figure 1 for Analyzing the Use of Influence Functions for Instance-Specific Data Filtering in Neural Machine Translation
Figure 2 for Analyzing the Use of Influence Functions for Instance-Specific Data Filtering in Neural Machine Translation
Figure 3 for Analyzing the Use of Influence Functions for Instance-Specific Data Filtering in Neural Machine Translation
Figure 4 for Analyzing the Use of Influence Functions for Instance-Specific Data Filtering in Neural Machine Translation

Customer feedback can be an important signal for improving commercial machine translation systems. One solution for fixing specific translation errors is to remove the related erroneous training instances followed by re-training of the machine translation system, which we refer to as instance-specific data filtering. Influence functions (IF) have been shown to be effective in finding such relevant training examples for classification tasks such as image classification, toxic speech detection and entailment task. Given a probing instance, IF find influential training examples by measuring the similarity of the probing instance with a set of training examples in gradient space. In this work, we examine the use of influence functions for Neural Machine Translation (NMT). We propose two effective extensions to a state of the art influence function and demonstrate on the sub-problem of copied training examples that IF can be applied more generally than handcrafted regular expressions.

* Accepted at WMT 2022 
Viaarxiv icon

Automatic Evaluation and Analysis of Idioms in Neural Machine Translation

Oct 10, 2022
Christos Baziotis, Prashant Mathur, Eva Hasler

Figure 1 for Automatic Evaluation and Analysis of Idioms in Neural Machine Translation
Figure 2 for Automatic Evaluation and Analysis of Idioms in Neural Machine Translation
Figure 3 for Automatic Evaluation and Analysis of Idioms in Neural Machine Translation
Figure 4 for Automatic Evaluation and Analysis of Idioms in Neural Machine Translation

A major open problem in neural machine translation (NMT) is the translation of idiomatic expressions, such as "under the weather". The meaning of these expressions is not composed by the meaning of their constituent words, and NMT models tend to translate them literally (i.e., word-by-word), which leads to confusing and nonsensical translations. Research on idioms in NMT is limited and obstructed by the absence of automatic methods for quantifying these errors. In this work, first, we propose a novel metric for automatically measuring the frequency of literal translation errors without human involvement. Equipped with this metric, we present controlled translation experiments with models trained in different conditions (with/without the test-set idioms) and across a wide range of (global and targeted) metrics and test sets. We explore the role of monolingual pretraining and find that it yields substantial targeted improvements, even without observing any translation examples of the test-set idioms. In our analysis, we probe the role of idiom context. We find that the randomly initialized models are more local or "myopic" as they are relatively unaffected by variations of the idiom context, unlike the pretrained ones.

Viaarxiv icon

The Devil is in the Details: On the Pitfalls of Vocabulary Selection in Neural Machine Translation

May 13, 2022
Tobias Domhan, Eva Hasler, Ke Tran, Sony Trenous, Bill Byrne, Felix Hieber

Figure 1 for The Devil is in the Details: On the Pitfalls of Vocabulary Selection in Neural Machine Translation
Figure 2 for The Devil is in the Details: On the Pitfalls of Vocabulary Selection in Neural Machine Translation
Figure 3 for The Devil is in the Details: On the Pitfalls of Vocabulary Selection in Neural Machine Translation
Figure 4 for The Devil is in the Details: On the Pitfalls of Vocabulary Selection in Neural Machine Translation

Vocabulary selection, or lexical shortlisting, is a well-known technique to improve latency of Neural Machine Translation models by constraining the set of allowed output words during inference. The chosen set is typically determined by separately trained alignment model parameters, independent of the source-sentence context at inference time. While vocabulary selection appears competitive with respect to automatic quality metrics in prior work, we show that it can fail to select the right set of output words, particularly for semantically non-compositional linguistic phenomena such as idiomatic expressions, leading to reduced translation quality as perceived by humans. Trading off latency for quality by increasing the size of the allowed set is often not an option in real-world scenarios. We propose a model of vocabulary selection, integrated into the neural translation model, that predicts the set of allowed output words from contextualized encoder representations. This restores translation quality of an unconstrained system, as measured by human evaluations on WMT newstest2020 and idiomatic expressions, at an inference latency competitive with alignment-based selection using aggressive thresholds, thereby removing the dependency on separately trained alignment models.

* NAACL 2022 
Viaarxiv icon

Neural Machine Translation Decoding with Terminology Constraints

May 09, 2018
Eva Hasler, Adrià De Gispert, Gonzalo Iglesias, Bill Byrne

Figure 1 for Neural Machine Translation Decoding with Terminology Constraints
Figure 2 for Neural Machine Translation Decoding with Terminology Constraints
Figure 3 for Neural Machine Translation Decoding with Terminology Constraints
Figure 4 for Neural Machine Translation Decoding with Terminology Constraints

Despite the impressive quality improvements yielded by neural machine translation (NMT) systems, controlling their translation output to adhere to user-provided terminology constraints remains an open problem. We describe our approach to constrained neural decoding based on finite-state machines and multi-stack decoding which supports target-side constraints as well as constraints with corresponding aligned input text spans. We demonstrate the performance of our framework on multiple translation tasks and motivate the need for constrained decoding with attentions as a means of reducing misplacement and duplication when translating user constraints.

* Proceedings of NAACL-HLT 2018 
Viaarxiv icon

Accelerating NMT Batched Beam Decoding with LMBR Posteriors for Deployment

Apr 30, 2018
Gonzalo Iglesias, William Tambellini, Adrià De Gispert, Eva Hasler, Bill Byrne

Figure 1 for Accelerating NMT Batched Beam Decoding with LMBR Posteriors for Deployment
Figure 2 for Accelerating NMT Batched Beam Decoding with LMBR Posteriors for Deployment
Figure 3 for Accelerating NMT Batched Beam Decoding with LMBR Posteriors for Deployment
Figure 4 for Accelerating NMT Batched Beam Decoding with LMBR Posteriors for Deployment

We describe a batched beam decoding algorithm for NMT with LMBR n-gram posteriors, showing that LMBR techniques still yield gains on top of the best recently reported results with Transformers. We also discuss acceleration strategies for deployment, and the effect of the beam size and batching on memory and speed.

* Proceedings of NAACL-HLT 2018 
Viaarxiv icon

A Comparison of Neural Models for Word Ordering

Aug 05, 2017
Eva Hasler, Felix Stahlberg, Marcus Tomalin, Adri`a de Gispert, Bill Byrne

Figure 1 for A Comparison of Neural Models for Word Ordering
Figure 2 for A Comparison of Neural Models for Word Ordering
Figure 3 for A Comparison of Neural Models for Word Ordering
Figure 4 for A Comparison of Neural Models for Word Ordering

We compare several language models for the word-ordering task and propose a new bag-to-sequence neural model based on attention-based sequence-to-sequence models. We evaluate the model on a large German WMT data set where it significantly outperforms existing models. We also describe a novel search strategy for LM-based word ordering and report results on the English Penn Treebank. Our best model setup outperforms prior work both in terms of speed and quality.

* Accepted for publication at INLG 2017 
Viaarxiv icon

SGNMT -- A Flexible NMT Decoding Platform for Quick Prototyping of New Models and Search Strategies

Jul 21, 2017
Felix Stahlberg, Eva Hasler, Danielle Saunders, Bill Byrne

Figure 1 for SGNMT -- A Flexible NMT Decoding Platform for Quick Prototyping of New Models and Search Strategies
Figure 2 for SGNMT -- A Flexible NMT Decoding Platform for Quick Prototyping of New Models and Search Strategies
Figure 3 for SGNMT -- A Flexible NMT Decoding Platform for Quick Prototyping of New Models and Search Strategies
Figure 4 for SGNMT -- A Flexible NMT Decoding Platform for Quick Prototyping of New Models and Search Strategies

This paper introduces SGNMT, our experimental platform for machine translation research. SGNMT provides a generic interface to neural and symbolic scoring modules (predictors) with left-to-right semantic such as translation models like NMT, language models, translation lattices, $n$-best lists or other kinds of scores and constraints. Predictors can be combined with other predictors to form complex decoding tasks. SGNMT implements a number of search strategies for traversing the space spanned by the predictors which are appropriate for different predictor constellations. Adding new predictors or decoding strategies is particularly easy, making it a very efficient tool for prototyping new research ideas. SGNMT is actively being used by students in the MPhil program in Machine Learning, Speech and Language Technology at the University of Cambridge for course work and theses, as well as for most of the research work in our group.

* Accepted as EMNLP 2017 demo paper 
Viaarxiv icon

Neural Machine Translation by Minimising the Bayes-risk with Respect to Syntactic Translation Lattices

Feb 13, 2017
Felix Stahlberg, Adrià de Gispert, Eva Hasler, Bill Byrne

Figure 1 for Neural Machine Translation by Minimising the Bayes-risk with Respect to Syntactic Translation Lattices
Figure 2 for Neural Machine Translation by Minimising the Bayes-risk with Respect to Syntactic Translation Lattices
Figure 3 for Neural Machine Translation by Minimising the Bayes-risk with Respect to Syntactic Translation Lattices

We present a novel scheme to combine neural machine translation (NMT) with traditional statistical machine translation (SMT). Our approach borrows ideas from linearised lattice minimum Bayes-risk decoding for SMT. The NMT score is combined with the Bayes-risk of the translation according the SMT lattice. This makes our approach much more flexible than $n$-best list or lattice rescoring as the neural decoder is not restricted to the SMT search space. We show an efficient and simple way to integrate risk estimation into the NMT decoder which is suitable for word-level as well as subword-unit-level NMT. We test our method on English-German and Japanese-English and report significant gains over lattice rescoring on several data sets for both single and ensembled NMT. The MBR decoder produces entirely new hypotheses far beyond simply rescoring the SMT search space or fixing UNKs in the NMT output.

* EACL2017 short paper 
Viaarxiv icon

The Edit Distance Transducer in Action: The University of Cambridge English-German System at WMT16

Jun 15, 2016
Felix Stahlberg, Eva Hasler, Bill Byrne

Figure 1 for The Edit Distance Transducer in Action: The University of Cambridge English-German System at WMT16
Figure 2 for The Edit Distance Transducer in Action: The University of Cambridge English-German System at WMT16
Figure 3 for The Edit Distance Transducer in Action: The University of Cambridge English-German System at WMT16
Figure 4 for The Edit Distance Transducer in Action: The University of Cambridge English-German System at WMT16

This paper presents the University of Cambridge submission to WMT16. Motivated by the complementary nature of syntactical machine translation and neural machine translation (NMT), we exploit the synergies of Hiero and NMT in different combination schemes. Starting out with a simple neural lattice rescoring approach, we show that the Hiero lattices are often too narrow for NMT ensembles. Therefore, instead of a hard restriction of the NMT search space to the lattice, we propose to loosely couple NMT and Hiero by composition with a modified version of the edit distance transducer. The loose combination outperforms lattice rescoring, especially when using multiple NMT systems in an ensemble.

Viaarxiv icon

Syntactically Guided Neural Machine Translation

May 19, 2016
Felix Stahlberg, Eva Hasler, Aurelien Waite, Bill Byrne

Figure 1 for Syntactically Guided Neural Machine Translation
Figure 2 for Syntactically Guided Neural Machine Translation
Figure 3 for Syntactically Guided Neural Machine Translation
Figure 4 for Syntactically Guided Neural Machine Translation

We investigate the use of hierarchical phrase-based SMT lattices in end-to-end neural machine translation (NMT). Weight pushing transforms the Hiero scores for complete translation hypotheses, with the full translation grammar score and full n-gram language model score, into posteriors compatible with NMT predictive probabilities. With a slightly modified NMT beam-search decoder we find gains over both Hiero and NMT decoding alone, with practical advantages in extending NMT to very large input and output vocabularies.

* ACL 2016 
Viaarxiv icon