Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Trevor Cohn

University of Melbourne

Predicting Human Translation Difficulty with Neural Machine Translation

Dec 19, 2023

Zheng Wei Lim, Ekaterina Vylomova, Charles Kemp, Trevor Cohn

Abstract:Human translators linger on some words and phrases more than others, and predicting this variation is a step towards explaining the underlying cognitive processes. Using data from the CRITT Translation Process Research Database, we evaluate the extent to which surprisal and attentional features derived from a Neural Machine Translation (NMT) model account for reading and production times of human translators. We find that surprisal and attention are complementary predictors of translation difficulty, and that surprisal derived from a NMT model is the single most successful predictor of production duration. Our analyses draw on data from hundreds of translators operating across 13 language pairs, and represent the most comprehensive investigation of human translation difficulty to date.

Via

Access Paper or Ask Questions

Noisy Self-Training with Synthetic Queries for Dense Retrieval

Nov 27, 2023

Fan Jiang, Tom Drummond, Trevor Cohn

Abstract:Although existing neural retrieval models reveal promising results when training data is abundant and the performance keeps improving as training data increases, collecting high-quality annotated data is prohibitively costly. To this end, we introduce a novel noisy self-training framework combined with synthetic queries, showing that neural retrievers can be improved in a self-evolution manner with no reliance on any external models. Experimental results show that our method improves consistently over existing methods on both general-domain (e.g., MS-MARCO) and out-of-domain (i.e., BEIR) retrieval benchmarks. Extra analysis on low-resource settings reveals that our method is data efficient and outperforms competitive baselines, with as little as 30% of labelled training data. Further extending the framework for reranker training demonstrates that the proposed method is general and yields additional gains on tasks of diverse domains.\footnote{Source code is available at \url{https://github.com/Fantabulous-J/Self-Training-DPR}}

* Accepted by EMNLP 2023 Findings

Via

Access Paper or Ask Questions

Boot and Switch: Alternating Distillation for Zero-Shot Dense Retrieval

Nov 27, 2023

Fan Jiang, Qiongkai Xu, Tom Drummond, Trevor Cohn

Figure 1 for Boot and Switch: Alternating Distillation for Zero-Shot Dense Retrieval

Figure 2 for Boot and Switch: Alternating Distillation for Zero-Shot Dense Retrieval

Figure 3 for Boot and Switch: Alternating Distillation for Zero-Shot Dense Retrieval

Figure 4 for Boot and Switch: Alternating Distillation for Zero-Shot Dense Retrieval

Abstract:Neural 'dense' retrieval models are state of the art for many datasets, however these models often exhibit limited domain transfer ability. Existing approaches to adaptation are unwieldy, such as requiring explicit supervision, complex model architectures, or massive external models. We present $\texttt{ABEL}$, a simple but effective unsupervised method to enhance passage retrieval in zero-shot settings. Our technique follows a straightforward loop: a dense retriever learns from supervision signals provided by a reranker, and subsequently, the reranker is updated based on feedback from the improved retriever. By iterating this loop, the two components mutually enhance one another's performance. Experimental results demonstrate that our unsupervised $\texttt{ABEL}$ model outperforms both leading supervised and unsupervised retrievers on the BEIR benchmark. Meanwhile, it exhibits strong adaptation abilities to tasks and domains that were unseen during training. By either fine-tuning $\texttt{ABEL}$ on labelled data or integrating it with existing supervised dense retrievers, we achieve state-of-the-art results.\footnote{Source code is available at \url{https://github.com/Fantabulous-J/BootSwitch}.}

* Accepted by EMNLP 2023 Findings

Via

Access Paper or Ask Questions

Multi-EuP: The Multilingual European Parliament Dataset for Analysis of Bias in Information Retrieval

Nov 03, 2023

Jinrui Yang, Timothy Baldwin, Trevor Cohn

Figure 1 for Multi-EuP: The Multilingual European Parliament Dataset for Analysis of Bias in Information Retrieval

Figure 2 for Multi-EuP: The Multilingual European Parliament Dataset for Analysis of Bias in Information Retrieval

Figure 3 for Multi-EuP: The Multilingual European Parliament Dataset for Analysis of Bias in Information Retrieval

Figure 4 for Multi-EuP: The Multilingual European Parliament Dataset for Analysis of Bias in Information Retrieval

Abstract:We present Multi-EuP, a new multilingual benchmark dataset, comprising 22K multi-lingual documents collected from the European Parliament, spanning 24 languages. This dataset is designed to investigate fairness in a multilingual information retrieval (IR) context to analyze both language and demographic bias in a ranking context. It boasts an authentic multilingual corpus, featuring topics translated into all 24 languages, as well as cross-lingual relevance judgments. Furthermore, it offers rich demographic information associated with its documents, facilitating the study of demographic bias. We report the effectiveness of Multi-EuP for benchmarking both monolingual and multilingual IR. We also conduct a preliminary experiment on language bias caused by the choice of tokenization strategy.

* Accepted at The 3rd Multilingual Representation Learning (MRL) Workshop (co-located with EMNLP 2023)

Via

Access Paper or Ask Questions

Language models are not naysayers: An analysis of language models on negation benchmarks

Jun 14, 2023

Thinh Hung Truong, Timothy Baldwin, Karin Verspoor, Trevor Cohn

Figure 1 for Language models are not naysayers: An analysis of language models on negation benchmarks

Figure 2 for Language models are not naysayers: An analysis of language models on negation benchmarks

Figure 3 for Language models are not naysayers: An analysis of language models on negation benchmarks

Figure 4 for Language models are not naysayers: An analysis of language models on negation benchmarks

Abstract:Negation has been shown to be a major bottleneck for masked language models, such as BERT. However, whether this finding still holds for larger-sized auto-regressive language models (``LLMs'') has not been studied comprehensively. With the ever-increasing volume of research and applications of LLMs, we take a step back to evaluate the ability of current-generation LLMs to handle negation, a fundamental linguistic phenomenon that is central to language understanding. We evaluate different LLMs -- including the open-source GPT-neo, GPT-3, and InstructGPT -- against a wide range of negation benchmarks. Through systematic experimentation with varying model sizes and prompts, we show that LLMs have several limitations including insensitivity to the presence of negation, an inability to capture the lexical semantics of negation, and a failure to reason under negation.

Via

Access Paper or Ask Questions

A Reminder of its Brittleness: Language Reward Shaping May Hinder Learning for Instruction Following Agents

May 26, 2023

Sukai Huang, Nir Lipovetzky, Trevor Cohn

Abstract:Teaching agents to follow complex written instructions has been an important yet elusive goal. One technique for improving learning efficiency is language reward shaping (LRS), which is used in reinforcement learning (RL) to reward actions that represent progress towards a sparse reward. We argue that the apparent success of LRS is brittle, and prior positive findings can be attributed to weak RL baselines. Specifically, we identified suboptimal LRS designs that reward partially matched trajectories, and we characterised a novel type of reward perturbation that addresses this issue based on the concept of loosening task constraints. We provided theoretical and empirical evidence that agents trained using LRS rewards converge more slowly compared to pure RL agents.

Via

Access Paper or Ask Questions

IMBERT: Making BERT Immune to Insertion-based Backdoor Attacks

May 25, 2023

Xuanli He, Jun Wang, Benjamin Rubinstein, Trevor Cohn

Abstract:Backdoor attacks are an insidious security threat against machine learning models. Adversaries can manipulate the predictions of compromised models by inserting triggers into the training phase. Various backdoor attacks have been devised which can achieve nearly perfect attack success without affecting model predictions for clean inputs. Means of mitigating such vulnerabilities are underdeveloped, especially in natural language processing. To fill this gap, we introduce IMBERT, which uses either gradients or self-attention scores derived from victim models to self-defend against backdoor attacks at inference time. Our empirical studies demonstrate that IMBERT can effectively identify up to 98.5% of inserted triggers. Thus, it significantly reduces the attack success rate while attaining competitive accuracy on the clean dataset across widespread insertion-based attacks compared to two baselines. Finally, we show that our approach is model-agnostic, and can be easily ported to several pre-trained transformer models.

* accepted to Third Workshop on Trustworthy Natural Language Processing

Via

Access Paper or Ask Questions

Mitigating Backdoor Poisoning Attacks through the Lens of Spurious Correlation

May 19, 2023

Xuanli He, Qiongkai Xu, Jun Wang, Benjamin Rubinstein, Trevor Cohn

Figure 1 for Mitigating Backdoor Poisoning Attacks through the Lens of Spurious Correlation

Figure 2 for Mitigating Backdoor Poisoning Attacks through the Lens of Spurious Correlation

Figure 3 for Mitigating Backdoor Poisoning Attacks through the Lens of Spurious Correlation

Figure 4 for Mitigating Backdoor Poisoning Attacks through the Lens of Spurious Correlation

Abstract:Modern NLP models are often trained over large untrusted datasets, raising the potential for a malicious adversary to compromise model behaviour. For instance, backdoors can be implanted through crafting training instances with a specific textual trigger and a target label. This paper posits that backdoor poisoning attacks exhibit spurious correlation between simple text features and classification labels, and accordingly, proposes methods for mitigating spurious correlation as means of defence. Our empirical study reveals that the malicious triggers are highly correlated to their target labels; therefore such correlations are extremely distinguishable compared to those scores of benign features, and can be used to filter out potentially problematic instances. Compared with several existing defences, our defence method significantly reduces attack success rates across backdoor attacks, and in the case of insertion based attacks, our method provides a near-perfect defence.

* 14 pages, 4 figures

Via

Access Paper or Ask Questions

DeltaScore: Evaluating Story Generation with Differentiating Perturbations

Mar 15, 2023

Zhuohan Xie, Miao Li, Trevor Cohn, Jey Han Lau

Abstract:Various evaluation metrics exist for natural language generation tasks, but they have limited utility for story generation since they generally do not correlate well with human judgments and do not measure fine-grained story aspects, such as fluency versus relatedness, as they are intended to assess overall generation quality. In this paper, we propose deltascore, an approach that utilizes perturbation to evaluate fine-grained story aspects. Our core idea is based on the hypothesis that the better the story performs in a specific aspect (e.g., fluency), the more it will be affected by a particular perturbation (e.g., introducing typos). To measure the impact, we calculate the likelihood difference between the pre- and post-perturbation stories using a language model. We evaluate deltascore against state-of-the-art model-based and traditional similarity-based metrics across multiple story domains, and investigate its correlation with human judgments on five fine-grained story aspects: fluency, coherence, relatedness, logicality, and interestingness. Our results demonstrate that deltascore performs impressively in evaluating fine-grained story aspects, and we discovered a striking outcome where a specific perturbation appears to be highly effective in measuring most aspects.

Via

Access Paper or Ask Questions

Fair Enough: Standardizing Evaluation and Model Selection for Fairness Research in NLP

Feb 11, 2023

Xudong Han, Timothy Baldwin, Trevor Cohn

Abstract:Modern NLP systems exhibit a range of biases, which a growing literature on model debiasing attempts to correct. However current progress is hampered by a plurality of definitions of bias, means of quantification, and oftentimes vague relation between debiasing algorithms and theoretical measures of bias. This paper seeks to clarify the current situation and plot a course for meaningful progress in fair learning, with two key contributions: (1) making clear inter-relations among the current gamut of methods, and their relation to fairness theory; and (2) addressing the practical problem of model selection, which involves a trade-off between fairness and accuracy and has led to systemic issues in fairness research. Putting them together, we make several recommendations to help shape future work.

* EACL 2023

Via

Access Paper or Ask Questions