



Abstract:Free-text crash narratives recorded in real-world crash databases have been shown to play a significant role in improving traffic safety. However, large-scale analyses remain difficult to implement as there are no documented tools that can batch process the unstructured, non standardized text content written by various authors with diverse experience and attention to detail. In recent years, Transformer-based pre-trained language models (PLMs), such as Bidirectional Encoder Representations from Transformers (BERT) and large language models (LLMs), have demonstrated strong capabilities across various natural language processing tasks. These models can extract explicit facts from crash narratives, but their performance declines on inference-heavy tasks in, for example, Crash Type identification, which can involve nearly 100 categories. Moreover, relying on closed LLMs through external APIs raises privacy concerns for sensitive crash data. Additionally, these black-box tools often underperform due to limited domain knowledge. Motivated by these challenges, we study whether compact open-source PLMs can support reasoning-intensive extraction from crash narratives. We target two challenging objectives: 1) identifying the Manner of Collision for a crash, and 2) Crash Type for each vehicle involved in the crash event from real-world crash narratives. To bridge domain gaps, we apply fine-tuning techniques to inject task-specific knowledge to LLMs with Low-Rank Adaption (LoRA) and BERT. Experiments on the authoritative real-world dataset Crash Investigation Sampling System (CISS) demonstrate that our fine-tuned compact models outperform strong closed LLMs, such as GPT-4o, while requiring only minimal training resources. Further analysis reveals that the fine-tuned PLMs can capture richer narrative details and even correct some mislabeled annotations in the dataset.
Abstract:Large language models (LLMs) have shown promise in table Question Answering (Table QA). However, extending these capabilities to multi-table QA remains challenging due to unreliable schema linking across complex tables. Existing methods based on semantic similarity work well only on simplified hand-crafted datasets and struggle to handle complex, real-world scenarios with numerous and diverse columns. To address this, we propose a graph-based framework that leverages human-curated relational knowledge to explicitly encode schema links and join paths. Given a natural language query, our method searches this graph to construct interpretable reasoning chains, aided by pruning and sub-path merging strategies to enhance efficiency and coherence. Experiments on both standard benchmarks and a realistic, large-scale dataset demonstrate the effectiveness of our approach. To our knowledge, this is the first multi-table QA system applied to truly complex industrial tabular data.
Abstract:This paper provides a general framework for efficiently obtaining the appropriate intervention time for collision avoidance systems to just avoid a rear-end crash. The proposed framework incorporates a driver comfort model and a vehicle model. We show that there is a relationship between driver steering manoeuvres based on acceleration and jerk, and steering angle and steering angle rate profiles. We investigate how four different vehicle models influence the time when steering needs to be initiated to avoid a rear-end collision. The models assessed were: a dynamic bicycle model (DM), a steady-state cornering model (SSCM), a kinematic model (KM) and a point mass model (PMM). We show that all models can be described by a parameter-varying linear system. We provide three algorithms for steering that use a linear system to compute the intervention time efficiently for all four vehicle models. Two of the algorithms use backward reachability simulation and one uses forward simulation. Results show that the SSCM, KM and PMM do not accurately estimate the intervention time for a certain set of vehicle conditions. Due to its fast computation time, DM with a backward reachability algorithm can be used for rapid offline safety benefit assessment, while DM with a forward simulation algorithm is better suited for online real-time usage.