Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Nigel Collier

Department of Language Engineering, UMIST, UK

Generative Language Models Exhibit Social Identity Biases

Oct 24, 2023

Tiancheng Hu, Yara Kyrychenko, Steve Rathje, Nigel Collier, Sander van der Linden, Jon Roozenbeek

Abstract:The surge in popularity of large language models has given rise to concerns about biases that these models could learn from humans. In this study, we investigate whether ingroup solidarity and outgroup hostility, fundamental social biases known from social science, are present in 51 large language models. We find that almost all foundational language models and some instruction fine-tuned models exhibit clear ingroup-positive and outgroup-negative biases when prompted to complete sentences (e.g., "We are..."). A comparison of LLM-generated sentences with human-written sentences on the internet reveals that these models exhibit similar level, if not greater, levels of bias than human text. To investigate where these biases stem from, we experimentally varied the amount of ingroup-positive or outgroup-negative sentences the model was exposed to during fine-tuning in the context of the United States Democrat-Republican divide. Doing so resulted in the models exhibiting a marked increase in ingroup solidarity and an even greater increase in outgroup hostility. Furthermore, removing either ingroup-positive or outgroup-negative sentences (or both) from the fine-tuning data leads to a significant reduction in both ingroup solidarity and outgroup hostility, suggesting that biases can be reduced by removing biased training data. Our findings suggest that modern language models exhibit fundamental social identity biases and that such biases can be mitigated by curating training data. Our results have practical implications for creating less biased large-language models and further underscore the need for more research into user interactions with LLMs to prevent potential bias reinforcement in humans.

* supplementary material, data, and code see https://osf.io/9ht32/?view_only=f0ab4b23325f4c31ad3e12a7353b55f5

Via

Access Paper or Ask Questions

POSQA: Probe the World Models of LLMs with Size Comparisons

Oct 20, 2023

Chang Shu, Jiuzhou Han, Fangyu Liu, Ehsan Shareghi, Nigel Collier

Figure 1 for POSQA: Probe the World Models of LLMs with Size Comparisons

Figure 2 for POSQA: Probe the World Models of LLMs with Size Comparisons

Figure 3 for POSQA: Probe the World Models of LLMs with Size Comparisons

Figure 4 for POSQA: Probe the World Models of LLMs with Size Comparisons

Abstract:Embodied language comprehension emphasizes that language understanding is not solely a matter of mental processing in the brain but also involves interactions with the physical and social environment. With the explosive growth of Large Language Models (LLMs) and their already ubiquitous presence in our daily lives, it is becoming increasingly necessary to verify their real-world understanding. Inspired by cognitive theories, we propose POSQA: a Physical Object Size Question Answering dataset with simple size comparison questions to examine the extremity and analyze the potential mechanisms of the embodied comprehension of the latest LLMs. We show that even the largest LLMs today perform poorly under the zero-shot setting. We then push their limits with advanced prompting techniques and external knowledge augmentation. Furthermore, we investigate whether their real-world comprehension primarily derives from contextual information or internal weights and analyse the impact of prompt formats and report bias of different objects. Our results show that real-world understanding that LLMs shaped from textual data can be vulnerable to deception and confusion by the surface form of prompts, which makes it less aligned with human behaviours.

* Accepted by EMNLP 2023 Findings

Via

Access Paper or Ask Questions

Repetition In Repetition Out: Towards Understanding Neural Text Degeneration from the Data Perspective

Oct 16, 2023

Huayang Li, Tian Lan, Zihao Fu, Deng Cai, Lemao Liu, Nigel Collier, Taro Watanabe, Yixuan Su

Figure 1 for Repetition In Repetition Out: Towards Understanding Neural Text Degeneration from the Data Perspective

Figure 2 for Repetition In Repetition Out: Towards Understanding Neural Text Degeneration from the Data Perspective

Figure 3 for Repetition In Repetition Out: Towards Understanding Neural Text Degeneration from the Data Perspective

Figure 4 for Repetition In Repetition Out: Towards Understanding Neural Text Degeneration from the Data Perspective

Abstract:There are a number of diverging hypotheses about the neural text degeneration problem, i.e., generating repetitive and dull loops, which makes this problem both interesting and confusing. In this work, we aim to advance our understanding by presenting a straightforward and fundamental explanation from the data perspective. Our preliminary investigation reveals a strong correlation between the degeneration issue and the presence of repetitions in training data. Subsequent experiments also demonstrate that by selectively dropping out the attention to repetitive words in training data, degeneration can be significantly minimized. Furthermore, our empirical analysis illustrates that prior works addressing the degeneration issue from various standpoints, such as the high-inflow words, the likelihood objective, and the self-reinforcement phenomenon, can be interpreted by one simple explanation. That is, penalizing the repetitions in training data is a common and fundamental factor for their effectiveness. Moreover, our experiments reveal that penalizing the repetitions in training data remains critical even when considering larger model sizes and instruction tuning.

* Accepted to NeurIPS 2023

Via

Access Paper or Ask Questions

FireAct: Toward Language Agent Fine-tuning

Oct 09, 2023

Baian Chen, Chang Shu, Ehsan Shareghi, Nigel Collier, Karthik Narasimhan, Shunyu Yao

Figure 1 for FireAct: Toward Language Agent Fine-tuning

Figure 2 for FireAct: Toward Language Agent Fine-tuning

Figure 3 for FireAct: Toward Language Agent Fine-tuning

Figure 4 for FireAct: Toward Language Agent Fine-tuning

Abstract:Recent efforts have augmented language models (LMs) with external tools or environments, leading to the development of language agents that can reason and act. However, most of these agents rely on few-shot prompting techniques with off-the-shelf LMs. In this paper, we investigate and argue for the overlooked direction of fine-tuning LMs to obtain language agents. Using a setup of question answering (QA) with a Google search API, we explore a variety of base LMs, prompting methods, fine-tuning data, and QA tasks, and find language agents are consistently improved after fine-tuning their backbone LMs. For example, fine-tuning Llama2-7B with 500 agent trajectories generated by GPT-4 leads to a 77% HotpotQA performance increase. Furthermore, we propose FireAct, a novel approach to fine-tuning LMs with trajectories from multiple tasks and prompting methods, and show having more diverse fine-tuning data can further improve agents. Along with other findings regarding scaling effects, robustness, generalization, efficiency and cost, our work establishes comprehensive benefits of fine-tuning LMs for agents, and provides an initial set of experimental designs, insights, as well as open questions toward language agent fine-tuning.

* Code, data, and models are available at https://fireact-agent.github.io

Via

Access Paper or Ask Questions

Sparkles: Unlocking Chats Across Multiple Images for Multimodal Instruction-Following Models

Aug 31, 2023

Yupan Huang, Zaiqiao Meng, Fangyu Liu, Yixuan Su, Nigel Collier, Yutong Lu

Figure 1 for Sparkles: Unlocking Chats Across Multiple Images for Multimodal Instruction-Following Models

Figure 2 for Sparkles: Unlocking Chats Across Multiple Images for Multimodal Instruction-Following Models

Figure 3 for Sparkles: Unlocking Chats Across Multiple Images for Multimodal Instruction-Following Models

Figure 4 for Sparkles: Unlocking Chats Across Multiple Images for Multimodal Instruction-Following Models

Abstract:Large language models exhibit enhanced zero-shot performance on various tasks when fine-tuned with instruction-following data. Multimodal instruction-following models extend these capabilities by integrating both text and images. However, existing models such as MiniGPT-4 face challenges in maintaining dialogue coherence in scenarios involving multiple images. A primary reason is the lack of a specialized dataset for this critical application. To bridge these gaps, we present SparklesChat, a multimodal instruction-following model for open-ended dialogues across multiple images. To support the training, we introduce SparklesDialogue, the first machine-generated dialogue dataset tailored for word-level interleaved multi-image and text interactions. Furthermore, we construct SparklesEval, a GPT-assisted benchmark for quantitatively assessing a model's conversational competence across multiple images and dialogue turns. Our experiments validate the effectiveness of SparklesChat in understanding and reasoning across multiple images and dialogue turns. Specifically, SparklesChat outperformed MiniGPT-4 on established vision-and-language benchmarks, including the BISON binary image selection task and the NLVR2 visual reasoning task. Moreover, SparklesChat scored 8.56 out of 10 on SparklesEval, substantially exceeding MiniGPT-4's score of 3.91 and nearing GPT-4's score of 9.26. Qualitative evaluations further demonstrate SparklesChat's generality in handling real-world applications. All resources will be available at https://github.com/HYPJUDY/Sparkles.

Via

Access Paper or Ask Questions

BAND: Biomedical Alert News Dataset

May 23, 2023

Zihao Fu, Meiru Zhang, Zaiqiao Meng, Yannan Shen, Anya Okhmatovskaia, David Buckeridge, Nigel Collier

Figure 1 for BAND: Biomedical Alert News Dataset

Figure 2 for BAND: Biomedical Alert News Dataset

Figure 3 for BAND: Biomedical Alert News Dataset

Figure 4 for BAND: Biomedical Alert News Dataset

Abstract:Infectious disease outbreaks continue to pose a significant threat to human health and well-being. To improve disease surveillance and understanding of disease spread, several surveillance systems have been developed to monitor daily news alerts and social media. However, existing systems lack thorough epidemiological analysis in relation to corresponding alerts or news, largely due to the scarcity of well-annotated reports data. To address this gap, we introduce the Biomedical Alert News Dataset (BAND), which includes 1,508 samples from existing reported news articles, open emails, and alerts, as well as 30 epidemiology-related questions. These questions necessitate the model's expert reasoning abilities, thereby offering valuable insights into the outbreak of the disease. The BAND dataset brings new challenges to the NLP world, requiring better disguise capability of the content and the ability to infer important information. We provide several benchmark tasks, including Named Entity Recognition (NER), Question Answering (QA), and Event Extraction (EE), to show how existing models are capable of handling these tasks in the epidemiology domain. To the best of our knowledge, the BAND corpus is the largest corpus of well-annotated biomedical outbreak alert news with elaborately designed questions, making it a valuable resource for epidemiologists and NLP researchers alike.

Via

Access Paper or Ask Questions

Biomedical Named Entity Recognition via Dictionary-based Synonym Generalization

May 22, 2023

Zihao Fu, Yixuan Su, Zaiqiao Meng, Nigel Collier

Figure 1 for Biomedical Named Entity Recognition via Dictionary-based Synonym Generalization

Figure 2 for Biomedical Named Entity Recognition via Dictionary-based Synonym Generalization

Figure 3 for Biomedical Named Entity Recognition via Dictionary-based Synonym Generalization

Figure 4 for Biomedical Named Entity Recognition via Dictionary-based Synonym Generalization

Abstract:Biomedical named entity recognition is one of the core tasks in biomedical natural language processing (BioNLP). To tackle this task, numerous supervised/distantly supervised approaches have been proposed. Despite their remarkable success, these approaches inescapably demand laborious human effort. To alleviate the need of human effort, dictionary-based approaches have been proposed to extract named entities simply based on a given dictionary. However, one downside of existing dictionary-based approaches is that they are challenged to identify concept synonyms that are not listed in the given dictionary, which we refer as the synonym generalization problem. In this study, we propose a novel Synonym Generalization (SynGen) framework that recognizes the biomedical concepts contained in the input text using span-based predictions. In particular, SynGen introduces two regularization terms, namely, (1) a synonym distance regularizer; and (2) a noise perturbation regularizer, to minimize the synonym generalization error. To demonstrate the effectiveness of our approach, we provide a theoretical analysis of the bound of synonym generalization error. We extensively evaluate our approach on a wide range of benchmarks and the results verify that SynGen outperforms previous dictionary-based models by notable margins. Lastly, we provide a detailed analysis to further reveal the merits and inner-workings of our approach.

Via

Access Paper or Ask Questions

PiVe: Prompting with Iterative Verification Improving Graph-based Generative Capability of LLMs

May 21, 2023

Jiuzhou Han, Nigel Collier, Wray Buntine, Ehsan Shareghi

Abstract:Large language models (LLMs) have shown great abilities of solving various natural language tasks in different domains. Due to the training objective of LLMs and their pretraining data, LLMs are not very well equipped for tasks involving structured data generation. We propose a framework, Prompting with Iterative Verification (PiVe), to improve graphbased generative capability of LLMs. We show how a small language model could be trained to act as a verifier module for the output of an LLM (i.e., ChatGPT), and to iteratively improve its performance via fine-grained corrective instructions. Additionally, we show how the verifier module could apply iterative corrections offline for a more cost-effective solution to the text-to-graph generation task. Experiments on three graph-based datasets show consistent improvement gained via PiVe. Additionally, we highlight how the proposed verifier module can be used as a data augmentation tool to help improve the quality of automatically generated parallel text-graph datasets. Our code and data are available at https://github.com/Jiuzhouh/PiVe.

Via

Access Paper or Ask Questions

Decoder-Only or Encoder-Decoder? Interpreting Language Model as a Regularized Encoder-Decoder

Apr 08, 2023

Zihao Fu, Wai Lam, Qian Yu, Anthony Man-Cho So, Shengding Hu, Zhiyuan Liu, Nigel Collier

Figure 1 for Decoder-Only or Encoder-Decoder? Interpreting Language Model as a Regularized Encoder-Decoder

Figure 2 for Decoder-Only or Encoder-Decoder? Interpreting Language Model as a Regularized Encoder-Decoder

Figure 3 for Decoder-Only or Encoder-Decoder? Interpreting Language Model as a Regularized Encoder-Decoder

Figure 4 for Decoder-Only or Encoder-Decoder? Interpreting Language Model as a Regularized Encoder-Decoder

Abstract:The sequence-to-sequence (seq2seq) task aims at generating the target sequence based on the given input source sequence. Traditionally, most of the seq2seq task is resolved by the Encoder-Decoder framework which requires an encoder to encode the source sequence and a decoder to generate the target text. Recently, a bunch of new approaches have emerged that apply decoder-only language models directly to the seq2seq task. Despite the significant advancements in applying language models to the seq2seq task, there is still a lack of thorough analysis on the effectiveness of the decoder-only language model architecture. This paper aims to address this gap by conducting a detailed comparison between the encoder-decoder architecture and the decoder-only language model framework through the analysis of a regularized encoder-decoder structure. This structure is designed to replicate all behaviors in the classical decoder-only language model but has an encoder and a decoder making it easier to be compared with the classical encoder-decoder structure. Based on the analysis, we unveil the attention degeneration problem in the language model, namely, as the generation step number grows, less and less attention is focused on the source sequence. To give a quantitative understanding of this problem, we conduct a theoretical sensitivity analysis of the attention output with respect to the source input. Grounded on our analysis, we propose a novel partial attention language model to solve the attention degeneration problem. Experimental results on machine translation, summarization, and data-to-text generation tasks support our analysis and demonstrate the effectiveness of our proposed model.

Via

Access Paper or Ask Questions

COFFEE: A Contrastive Oracle-Free Framework for Event Extraction

Mar 25, 2023

Meiru Zhang, Yixuan Su, Zaiqiao Meng, Zihao Fu, Nigel Collier

Figure 1 for COFFEE: A Contrastive Oracle-Free Framework for Event Extraction

Figure 2 for COFFEE: A Contrastive Oracle-Free Framework for Event Extraction

Figure 3 for COFFEE: A Contrastive Oracle-Free Framework for Event Extraction

Figure 4 for COFFEE: A Contrastive Oracle-Free Framework for Event Extraction

Abstract:Event extraction is a complex information extraction task that involves extracting events from unstructured text. Prior classification-based methods require comprehensive entity annotations for joint training, while newer generation-based methods rely on heuristic templates containing oracle information such as event type, which is often unavailable in real-world scenarios. In this study, we consider a more realistic setting of this task, namely the Oracle-Free Event Extraction (OFEE) task, where only the input context is given without any oracle information, including event type, event ontology and trigger word. To solve this task, we propose a new framework, called COFFEE, which extracts the events solely based on the document context without referring to any oracle information. In particular, a contrastive selection model is introduced in COFFEE to rectify the generated triggers and handle multi-event instances. The proposed COFFEE outperforms state-of-the-art approaches under the oracle-free setting of the event extraction task, as evaluated on a public event extraction benchmark ACE05.

Via

Access Paper or Ask Questions