Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Lydia Nishimwe

When the Gold Standard isn't Necessarily Standard: Challenges of Evaluating the Translation of User-Generated Content

Dec 19, 2025

Lydia Nishimwe, Benoît Sagot, Rachel Bawden

Figure 1 for When the Gold Standard isn't Necessarily Standard: Challenges of Evaluating the Translation of User-Generated Content

Figure 2 for When the Gold Standard isn't Necessarily Standard: Challenges of Evaluating the Translation of User-Generated Content

Figure 3 for When the Gold Standard isn't Necessarily Standard: Challenges of Evaluating the Translation of User-Generated Content

Figure 4 for When the Gold Standard isn't Necessarily Standard: Challenges of Evaluating the Translation of User-Generated Content

Abstract:User-generated content (UGC) is characterised by frequent use of non-standard language, from spelling errors to expressive choices such as slang, character repetitions, and emojis. This makes evaluating UGC translation particularly challenging: what counts as a "good" translation depends on the level of standardness desired in the output. To explore this, we examine the human translation guidelines of four UGC datasets, and derive a taxonomy of twelve non-standard phenomena and five translation actions (NORMALISE, COPY, TRANSFER, OMIT, CENSOR). Our analysis reveals notable differences in how UGC is treated, resulting in a spectrum of standardness in reference translations. Through a case study on large language models (LLMs), we show that translation scores are highly sensitive to prompts with explicit translation instructions for UGC, and that they improve when these align with the dataset's guidelines. We argue that when preserving UGC style is important, fair evaluation requires both models and metrics to be aware of translation guidelines. Finally, we call for clear guidelines during dataset creation and for the development of controllable, guideline-aware evaluation frameworks for UGC translation.

* 10 pages, 19 pages with references and appendices

Via

Access Paper or Ask Questions

Making Sentence Embeddings Robust to User-Generated Content

Mar 25, 2024

Lydia Nishimwe, Benoît Sagot, Rachel Bawden

Abstract:NLP models have been known to perform poorly on user-generated content (UGC), mainly because it presents a lot of lexical variations and deviates from the standard texts on which most of these models were trained. In this work, we focus on the robustness of LASER, a sentence embedding model, to UGC data. We evaluate this robustness by LASER's ability to represent non-standard sentences and their standard counterparts close to each other in the embedding space. Inspired by previous works extending LASER to other languages and modalities, we propose RoLASER, a robust English encoder trained using a teacher-student approach to reduce the distances between the representations of standard and UGC sentences. We show that with training only on standard and synthetic UGC-like data, RoLASER significantly improves LASER's robustness to both natural and artificial UGC data by achieving up to 2x and 11x better scores. We also perform a fine-grained analysis on artificial UGC data and find that our model greatly outperforms LASER on its most challenging UGC phenomena such as keyboard typos and social media abbreviations. Evaluation on downstream tasks shows that RoLASER performs comparably to or better than LASER on standard data, while consistently outperforming it on UGC data.

* Accepted at LREC-COLING 2024

Via

Access Paper or Ask Questions