Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ruth-Ann Armstrong

Seamless Deception: Larger Language Models Are Better Knowledge Concealers

Mar 15, 2026

Dhananjay Ashok, Ruth-Ann Armstrong, Jonathan May

Abstract:Language Models (LMs) may acquire harmful knowledge, and yet feign ignorance of these topics when under audit. Inspired by the recent discovery of deception-related behaviour patterns in LMs, we aim to train classifiers that detect when a LM is actively concealing knowledge. Initial findings on smaller models show that classifiers can detect concealment more reliably than human evaluators, with gradient-based concealment proving easier to identify than prompt-based methods. However, contrary to prior work, we find that the classifiers do not reliably generalize to unseen model architectures and topics of hidden knowledge. Most concerningly, the identifiable traces associated with concealment become fainter as the models increase in scale, with the classifiers achieving no better than random performance on any model exceeding 70 billion parameters. Our results expose a key limitation in black-box-only auditing of LMs and highlight the need to develop robust methods to detect models that are actively hiding the knowledge they contain.

Via

Access Paper or Ask Questions

JamPatoisNLI: A Jamaican Patois Natural Language Inference Dataset

Dec 07, 2022

Ruth-Ann Armstrong, John Hewitt, Christopher Manning

Figure 1 for JamPatoisNLI: A Jamaican Patois Natural Language Inference Dataset

Figure 2 for JamPatoisNLI: A Jamaican Patois Natural Language Inference Dataset

Figure 3 for JamPatoisNLI: A Jamaican Patois Natural Language Inference Dataset

Figure 4 for JamPatoisNLI: A Jamaican Patois Natural Language Inference Dataset

Abstract:JamPatoisNLI provides the first dataset for natural language inference in a creole language, Jamaican Patois. Many of the most-spoken low-resource languages are creoles. These languages commonly have a lexicon derived from a major world language and a distinctive grammar reflecting the languages of the original speakers and the process of language birth by creolization. This gives them a distinctive place in exploring the effectiveness of transfer from large monolingual or multilingual pretrained models. While our work, along with previous work, shows that transfer from these models to low-resource languages that are unrelated to languages in their training set is not very effective, we would expect stronger results from transfer to creoles. Indeed, our experiments show considerably better results from few-shot learning of JamPatoisNLI than for such unrelated languages, and help us begin to understand how the unique relationship between creoles and their high-resource base languages affect cross-lingual transfer. JamPatoisNLI, which consists of naturally-occurring premises and expert-written hypotheses, is a step towards steering research into a traditionally underserved language and a useful benchmark for understanding cross-lingual NLP.

* 14 pages, 3 figures, Findings of EMNLP 2022

Via

Access Paper or Ask Questions