Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Tom Kocmi

Poor Man's Quality Estimation: Predicting Reference-Based MT Metrics Without the Reference

Jan 28, 2023

Vilém Zouhar, Shehzaad Dhuliawala, Wangchunshu Zhou, Nico Daheim, Tom Kocmi, Yuchen Eleanor Jiang, Mrinmaya Sachan

Figure 1 for Poor Man's Quality Estimation: Predicting Reference-Based MT Metrics Without the Reference

Figure 2 for Poor Man's Quality Estimation: Predicting Reference-Based MT Metrics Without the Reference

Figure 3 for Poor Man's Quality Estimation: Predicting Reference-Based MT Metrics Without the Reference

Figure 4 for Poor Man's Quality Estimation: Predicting Reference-Based MT Metrics Without the Reference

Abstract:Machine translation quality estimation (QE) predicts human judgements of a translation hypothesis without seeing the reference. State-of-the-art QE systems based on pretrained language models have been achieving remarkable correlations with human judgements yet they are computationally heavy and require human annotations, which are slow and expensive to create. To address these limitations, we define the problem of metric estimation (ME) where one predicts the automated metric scores also without the reference. We show that even without access to the reference, our model can estimate automated metrics ($\rho$=60% for BLEU, $\rho$=51% for other metrics) at the sentence-level. Because automated metrics correlate with human judgements, we can leverage the ME task for pre-training a QE model. For the QE task, we find that pre-training on TER is better ($\rho$=23%) than training for scratch ($\rho$=20%).

* Accepted at EACL23 (main)

Via

Access Paper or Ask Questions

The Reality of Multi-Lingual Machine Translation

Feb 25, 2022

Tom Kocmi, Dominik Macháček, Ondřej Bojar

Figure 1 for The Reality of Multi-Lingual Machine Translation

Figure 2 for The Reality of Multi-Lingual Machine Translation

Figure 3 for The Reality of Multi-Lingual Machine Translation

Figure 4 for The Reality of Multi-Lingual Machine Translation

Abstract:Our book "The Reality of Multi-Lingual Machine Translation" discusses the benefits and perils of using more than two languages in machine translation systems. While focused on the particular task of sequence-to-sequence processing and multi-task learning, the book targets somewhat beyond the area of natural language processing. Machine translation is for us a prime example of deep learning applications where human skills and learning capabilities are taken as a benchmark that many try to match and surpass. We document that some of the gains observed in multi-lingual translation may result from simpler effects than the assumed cross-lingual transfer of knowledge. In the first, rather general part, the book will lead you through the motivation for multi-linguality, the versatility of deep neural networks especially in sequence-to-sequence tasks to complications of this learning. We conclude the general part with warnings against too optimistic and unjustified explanations of the gains that neural networks demonstrate. In the second part, we fully delve into multi-lingual models, with a particularly careful examination of transfer learning as one of the more straightforward approaches utilizing additional languages. The recent multi-lingual techniques, including massive models, are surveyed and practical aspects of deploying systems for many languages are discussed. The conclusion highlights the open problem of machine understanding and reminds of two ethical aspects of building large-scale models: the inclusivity of research and its ecological trace.

* ISBN 978-80-88132-11-0. arXiv admin note: substantial text overlap with arXiv:2001.01622

Via

Access Paper or Ask Questions

To Ship or Not to Ship: An Extensive Evaluation of Automatic Metrics for Machine Translation

Jul 22, 2021

Tom Kocmi, Christian Federmann, Roman Grundkiewicz, Marcin Junczys-Dowmunt, Hitokazu Matsushita, Arul Menezes

Figure 1 for To Ship or Not to Ship: An Extensive Evaluation of Automatic Metrics for Machine Translation

Figure 2 for To Ship or Not to Ship: An Extensive Evaluation of Automatic Metrics for Machine Translation

Figure 3 for To Ship or Not to Ship: An Extensive Evaluation of Automatic Metrics for Machine Translation

Figure 4 for To Ship or Not to Ship: An Extensive Evaluation of Automatic Metrics for Machine Translation

Abstract:Automatic metrics are commonly used as the exclusive tool for declaring the superiority of one machine translation system's quality over another. The community choice of automatic metric guides research directions and industrial developments by deciding which models are deemed better. Evaluating metrics correlations has been limited to a small collection of human judgements. In this paper, we corroborate how reliable metrics are in contrast to human judgements on - to the best of our knowledge - the largest collection of human judgements. We investigate which metrics have the highest accuracy to make system-level quality rankings for pairs of systems, taking human judgement as a gold standard, which is the closest scenario to the real metric usage. Furthermore, we evaluate the performance of various metrics across different language pairs and domains. Lastly, we show that the sole use of BLEU negatively affected the past development of improved models. We release the collection of human judgements of 4380 systems, and 2.3 M annotated sentences for further analysis and replication of our work.

Via

Access Paper or Ask Questions

On User Interfaces for Large-Scale Document-Level Human Evaluation of Machine Translation Outputs

Apr 21, 2021

Roman Grundkiewicz, Marcin Junczys-Dowmunt, Christian Federmann, Tom Kocmi

Figure 1 for On User Interfaces for Large-Scale Document-Level Human Evaluation of Machine Translation Outputs

Figure 2 for On User Interfaces for Large-Scale Document-Level Human Evaluation of Machine Translation Outputs

Figure 3 for On User Interfaces for Large-Scale Document-Level Human Evaluation of Machine Translation Outputs

Figure 4 for On User Interfaces for Large-Scale Document-Level Human Evaluation of Machine Translation Outputs

Abstract:Recent studies emphasize the need of document context in human evaluation of machine translations, but little research has been done on the impact of user interfaces on annotator productivity and the reliability of assessments. In this work, we compare human assessment data from the last two WMT evaluation campaigns collected via two different methods for document-level evaluation. Our analysis shows that a document-centric approach to evaluation where the annotator is presented with the entire document context on a screen leads to higher quality segment and document level assessments. It improves the correlation between segment and document scores and increases inter-annotator agreement for document scores but is considerably more time consuming for annotators.

* Presented at HumEval, EACL 2021

Via

Access Paper or Ask Questions

THEaiTRE 1.0: Interactive generation of theatre play scripts

Feb 17, 2021

Rudolf Rosa, Tomáš Musil, Ondřej Dušek, Dominik Jurko, Patrícia Schmidtová, David Mareček, Ondřej Bojar, Tom Kocmi, Daniel Hrbek, David Košťák(+6 more)

Abstract:We present the first version of a system for interactive generation of theatre play scripts. The system is based on a vanilla GPT-2 model with several adjustments, targeting specific issues we encountered in practice. We also list other issues we encountered but plan to only solve in a future version of the system. The presented system was used to generate a theatre play script planned for premiere in February 2021.

* Submitted to Text2Story workshop 2021

Via

Access Paper or Ask Questions

CUNI Systems for the Unsupervised and Very Low Resource Translation Task in WMT20

Oct 22, 2020

Ivana Kvapilíková, Tom Kocmi, Ondřej Bojar

Figure 1 for CUNI Systems for the Unsupervised and Very Low Resource Translation Task in WMT20

Figure 2 for CUNI Systems for the Unsupervised and Very Low Resource Translation Task in WMT20

Figure 3 for CUNI Systems for the Unsupervised and Very Low Resource Translation Task in WMT20

Abstract:This paper presents a description of CUNI systems submitted to the WMT20 task on unsupervised and very low-resource supervised machine translation between German and Upper Sorbian. We experimented with training on synthetic data and pre-training on a related language pair. In the fully unsupervised scenario, we achieved 25.5 and 23.7 BLEU translating from and into Upper Sorbian, respectively. Our low-resource systems relied on transfer learning from German-Czech parallel data and achieved 57.4 BLEU and 56.1 BLEU, which is an improvement of 10 BLEU points over the baseline trained only on the available small German-Upper Sorbian parallel corpus.

* WMT20

Via

Access Paper or Ask Questions

Gender Coreference and Bias Evaluation at WMT 2020

Oct 12, 2020

Tom Kocmi, Tomasz Limisiewicz, Gabriel Stanovsky

Figure 1 for Gender Coreference and Bias Evaluation at WMT 2020

Figure 2 for Gender Coreference and Bias Evaluation at WMT 2020

Figure 3 for Gender Coreference and Bias Evaluation at WMT 2020

Figure 4 for Gender Coreference and Bias Evaluation at WMT 2020

Abstract:Gender bias in machine translation can manifest when choosing gender inflections based on spurious gender correlations. For example, always translating doctors as men and nurses as women. This can be particularly harmful as models become more popular and deployed within commercial systems. Our work presents the largest evidence for the phenomenon in more than 19 systems submitted to the WMT over four diverse target languages: Czech, German, Polish, and Russian. To achieve this, we use WinoMT, a recent automatic test suite which examines gender coreference and bias when translating from English to languages with grammatical gender. We extend WinoMT to handle two new languages tested in WMT: Polish and Czech. We find that all systems consistently use spurious correlations in the data rather than meaningful contextual information.

* Accepted WMT20

Via

Access Paper or Ask Questions

Announcing CzEng 2.0 Parallel Corpus with over 2 Gigawords

Jul 06, 2020

Tom Kocmi, Martin Popel, Ondrej Bojar

Figure 1 for Announcing CzEng 2.0 Parallel Corpus with over 2 Gigawords

Figure 2 for Announcing CzEng 2.0 Parallel Corpus with over 2 Gigawords

Abstract:We present a new release of the Czech-English parallel corpus CzEng 2.0 consisting of over 2 billion words (2 "gigawords") in each language. The corpus contains document-level information and is filtered with several techniques to lower the amount of noise. In addition to the data in the previous version of CzEng, it contains new authentic and also high-quality synthetic parallel data. CzEng is freely available for research and educational purposes.

Via

Access Paper or Ask Questions

THEaiTRE: Artificial Intelligence to Write a Theatre Play

Jun 25, 2020

Rudolf Rosa, Ondřej Dušek, Tom Kocmi, David Mareček, Tomáš Musil, Patrícia Schmidtová, Dominik Jurko, Ondřej Bojar, Daniel Hrbek, David Košťák(+3 more)

Abstract:We present THEaiTRE, a starting project aimed at automatic generation of theatre play scripts. This paper reviews related work and drafts an approach we intend to follow. We plan to adopt generative neural language models and hierarchical generation approaches, supported by summarization and machine translation methods, and complemented with a human-in-the-loop approach.

* accepted to AI4Narratives2020

Via

Access Paper or Ask Questions

Exploring Benefits of Transfer Learning in Neural Machine Translation

Jan 06, 2020

Tom Kocmi

Figure 1 for Exploring Benefits of Transfer Learning in Neural Machine Translation

Figure 2 for Exploring Benefits of Transfer Learning in Neural Machine Translation

Figure 3 for Exploring Benefits of Transfer Learning in Neural Machine Translation

Figure 4 for Exploring Benefits of Transfer Learning in Neural Machine Translation

Abstract:Neural machine translation is known to require large numbers of parallel training sentences, which generally prevent it from excelling on low-resource language pairs. This thesis explores the use of cross-lingual transfer learning on neural networks as a way of solving the problem with the lack of resources. We propose several transfer learning approaches to reuse a model pretrained on a high-resource language pair. We pay particular attention to the simplicity of the techniques. We study two scenarios: (a) when we reuse the high-resource model without any prior modifications to its training process and (b) when we can prepare the first-stage high-resource model for transfer learning in advance. For the former scenario, we present a proof-of-concept method by reusing a model trained by other researchers. In the latter scenario, we present a method which reaches even larger improvements in translation performance. Apart from proposed techniques, we focus on an in-depth analysis of transfer learning techniques and try to shed some light on transfer learning improvements. We show how our techniques address specific problems of low-resource languages and are suitable even in high-resource transfer learning. We evaluate the potential drawbacks and behavior by studying transfer learning in various situations, for example, under artificially damaged training corpora, or with fixed various model parts.

* Defended PhD thesis

Via

Access Paper or Ask Questions