Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Robert Thomson

Domain-Adapted Pre-trained Language Models for Implicit Information Extraction in Crash Narratives

Oct 10, 2025

Xixi Wang, Jordanka Kovaceva, Miguel Costa, Shuai Wang, Francisco Camara Pereira, Robert Thomson

Figure 1 for Domain-Adapted Pre-trained Language Models for Implicit Information Extraction in Crash Narratives

Figure 2 for Domain-Adapted Pre-trained Language Models for Implicit Information Extraction in Crash Narratives

Figure 3 for Domain-Adapted Pre-trained Language Models for Implicit Information Extraction in Crash Narratives

Figure 4 for Domain-Adapted Pre-trained Language Models for Implicit Information Extraction in Crash Narratives

Abstract:Free-text crash narratives recorded in real-world crash databases have been shown to play a significant role in improving traffic safety. However, large-scale analyses remain difficult to implement as there are no documented tools that can batch process the unstructured, non standardized text content written by various authors with diverse experience and attention to detail. In recent years, Transformer-based pre-trained language models (PLMs), such as Bidirectional Encoder Representations from Transformers (BERT) and large language models (LLMs), have demonstrated strong capabilities across various natural language processing tasks. These models can extract explicit facts from crash narratives, but their performance declines on inference-heavy tasks in, for example, Crash Type identification, which can involve nearly 100 categories. Moreover, relying on closed LLMs through external APIs raises privacy concerns for sensitive crash data. Additionally, these black-box tools often underperform due to limited domain knowledge. Motivated by these challenges, we study whether compact open-source PLMs can support reasoning-intensive extraction from crash narratives. We target two challenging objectives: 1) identifying the Manner of Collision for a crash, and 2) Crash Type for each vehicle involved in the crash event from real-world crash narratives. To bridge domain gaps, we apply fine-tuning techniques to inject task-specific knowledge to LLMs with Low-Rank Adaption (LoRA) and BERT. Experiments on the authoritative real-world dataset Crash Investigation Sampling System (CISS) demonstrate that our fine-tuned compact models outperform strong closed LLMs, such as GPT-4o, while requiring only minimal training resources. Further analysis reveals that the fine-tuned PLMs can capture richer narrative details and even correct some mislabeled annotations in the dataset.

Via

Access Paper or Ask Questions

Modelling Political Coalition Negotiations Using LLM-based Agents

Feb 18, 2024

Farhad Moghimifar, Yuan-Fang Li, Robert Thomson, Gholamreza Haffari

Figure 1 for Modelling Political Coalition Negotiations Using LLM-based Agents

Figure 2 for Modelling Political Coalition Negotiations Using LLM-based Agents

Figure 3 for Modelling Political Coalition Negotiations Using LLM-based Agents

Figure 4 for Modelling Political Coalition Negotiations Using LLM-based Agents

Abstract:Coalition negotiations are a cornerstone of parliamentary democracies, characterised by complex interactions and strategic communications among political parties. Despite its significance, the modelling of these negotiations has remained unexplored with the domain of Natural Language Processing (NLP), mostly due to lack of proper data. In this paper, we introduce coalition negotiations as a novel NLP task, and model it as a negotiation between large language model-based agents. We introduce a multilingual dataset, POLCA, comprising manifestos of European political parties and coalition agreements over a number of elections in these countries. This dataset addresses the challenge of the current scope limitations in political negotiation modelling by providing a diverse, real-world basis for simulation. Additionally, we propose a hierarchical Markov decision process designed to simulate the process of coalition negotiation between political parties and predict the outcomes. We evaluate the performance of state-of-the-art large language models (LLMs) as agents in handling coalition negotiations, offering insights into their capabilities and paving the way for future advancements in political modelling.

Via

Access Paper or Ask Questions