Abstract:Understanding and predicting human behavior in-thewild, particularly at urban intersections, remains crucial for enhancing interaction safety between road users. Among the most critical behaviors are crossing intentions of Vulnerable Road Users (VRUs), where misinterpretation may result in dangerous conflicts with oncoming vehicles. In this work, we propose the VRU-CIPI framework with a sequential attention-based model designed to predict VRU crossing intentions at intersections. VRU-CIPI employs Gated Recurrent Unit (GRU) to capture temporal dynamics in VRU movements, combined with a multi-head Transformer self-attention mechanism to encode contextual and spatial dependencies critical for predicting crossing direction. Evaluated on UCF-VRU dataset, our proposed achieves state-of-the-art performance with an accuracy of 96.45% and achieving real-time inference speed reaching 33 frames per second. Furthermore, by integrating with Infrastructure-to-Vehicles (I2V) communication, our approach can proactively enhance intersection safety through timely activation of crossing signals and providing early warnings to connected vehicles, ensuring smoother and safer interactions for all road users.
Abstract:Understanding the factors contributing to traffic crashes and developing strategies to mitigate their severity is essential. Traditional statistical methods and machine learning models often struggle to capture the complex interactions between various factors and the unique characteristics of each crash. This research leverages large language model (LLM) to analyze freeway crash data and provide crash causation analysis accordingly. By compiling 226 traffic safety studies related to freeway crashes, a training dataset encompassing environmental, driver, traffic, and geometric design factors was created. The Llama3 8B model was fine-tuned using QLoRA to enhance its understanding of freeway crashes and their contributing factors, as covered in these studies. The fine-tuned Llama3 8B model was then used to identify crash causation without pre-labeled data through zero-shot classification, providing comprehensive explanations to ensure that the identified causes were reasonable and aligned with existing research. Results demonstrate that LLMs effectively identify primary crash causes such as alcohol-impaired driving, speeding, aggressive driving, and driver inattention. Incorporating event data, such as road maintenance, offers more profound insights. The model's practical applicability and potential to improve traffic safety measures were validated by a high level of agreement among researchers in the field of traffic safety, as reflected in questionnaire results with 88.89%. This research highlights the complex nature of traffic crashes and how LLMs can be used for comprehensive analysis of crash causation and other contributing factors. Moreover, it provides valuable insights and potential countermeasures to aid planners and policymakers in developing more effective and efficient traffic safety practices.
Abstract:Computer vision has advanced research methodologies, enhancing system services across various fields. It is a core component in traffic monitoring systems for improving road safety; however, these monitoring systems don't preserve the privacy of pedestrians who appear in the videos, potentially revealing their identities. Addressing this issue, our paper introduces Video-to-Text Pedestrian Monitoring (VTPM), which monitors pedestrian movements at intersections and generates real-time textual reports, including traffic signal and weather information. VTPM uses computer vision models for pedestrian detection and tracking, achieving a latency of 0.05 seconds per video frame. Additionally, it detects crossing violations with 90.2% accuracy by incorporating traffic signal data. The proposed framework is equipped with Phi-3 mini-4k to generate real-time textual reports of pedestrian activity while stating safety concerns like crossing violations, conflicts, and the impact of weather on their behavior with latency of 0.33 seconds. To enhance comprehensive analysis of the generated textual reports, Phi-3 medium is fine-tuned for historical analysis of these generated textual reports. This fine-tuning enables more reliable analysis about the pedestrian safety at intersections, effectively detecting patterns and safety critical events. The proposed VTPM offers a more efficient alternative to video footage by using textual reports reducing memory usage, saving up to 253 million percent, eliminating privacy issues, and enabling comprehensive interactive historical analysis.
Abstract:This paper introduces our solution for Track 2 in AI City Challenge 2024. The task aims to solve traffic safety description and analysis with the dataset of Woven Traffic Safety (WTS), a real-world Pedestrian-Centric Traffic Video Dataset for Fine-grained Spatial-Temporal Understanding. Our solution mainly focuses on the following points: 1) To solve dense video captioning, we leverage the framework of dense video captioning with parallel decoding (PDVC) to model visual-language sequences and generate dense caption by chapters for video. 2) Our work leverages CLIP to extract visual features to more efficiently perform cross-modality training between visual and textual representations. 3) We conduct domain-specific model adaptation to mitigate domain shift problem that poses recognition challenge in video understanding. 4) Moreover, we leverage BDD-5K captioned videos to conduct knowledge transfer for better understanding WTS videos and more accurate captioning. Our solution has yielded on the test set, achieving 6th place in the competition. The open source code will be available at https://github.com/UCF-SST-Lab/AICity2024CVPRW
Abstract:Previous studies in predicting crash risk primarily associated the number or likelihood of crashes on a road segment with traffic parameters or geometric characteristics of the segment, usually neglecting the impact of vehicles' continuous movement and interactions with nearby vehicles. Advancements in communication technologies have empowered driving information collected from surrounding vehicles, enabling the study of group-based crash risks. Based on high-resolution vehicle trajectory data, this research focused on vehicle groups as the subject of analysis and explored risk formation and propagation mechanisms considering features of vehicle groups and road segments. Several key factors contributing to crash risks were identified, including past high-risk vehicle-group states, complex vehicle behaviors, high percentage of large vehicles, frequent lane changes within a vehicle group, and specific road geometries. A multinomial logistic regression model was developed to analyze the spatial risk propagation patterns, which were classified based on the trend of high-risk occurrences within vehicle groups. The results indicated that extended periods of high-risk states, increase in vehicle-group size, and frequent lane changes are associated with adverse risk propagation patterns. Conversely, smoother traffic flow and high initial crash risk values are linked to risk dissipation. Furthermore, the study conducted sensitivity analysis on different types of classifiers, prediction time intervalsss and adaptive TTC thresholds. The highest AUC value for vehicle-group risk prediction surpassed 0.93. The findings provide valuable insights to researchers and practitioners in understanding and prediction of vehicle-group safety, ultimately improving active traffic safety management and operations of Connected and Autonomous Vehicles.
Abstract:The EPC GEN 2 communication protocol for Ultra-high frequency Radio Frequency Identification (RFID) has offered a promising avenue for advancing the intelligence of transportation infrastructure. With the capability of linking vehicles to RFID readers to crowdsource information from RFID tags on road infrastructures, the RF-enhanced road infrastructure (REI) can potentially transform data acquisition for urban transportation. Despite its potential, the broader adoption of RFID technologies in building intelligent roads has been limited by a deficiency in understanding how the GEN 2 protocol impacts system performance under different transportation settings. This paper fills this knowledge gap by presenting the system architecture and detailing the design challenges associated with REI. Comprehensive real-world experiments are conducted to assess REI's effectiveness across various urban contexts. The results yield crucial insights into the optimal design of on-vehicle RFID readers and on-road RFID tags, considering the constraints imposed by vehicle dynamics, road geometries, and tag placements. With the optimized designs of encoding schemes for reader-tag communication and on-vehicle antennas, REI is able to fulfill the requirements of traffic sign inventory management and environmental monitoring while falling short of catering to the demand for high-speed navigation. In particular, the Miller 2 encoding scheme strikes the best balance between reading performance (e.g., throughput) and noise tolerance for the multipath effect. Additionally, we show that the on-vehicle antenna should be oriented to maximize the available time for reading on-road tags, although it may reduce the received power by the tags in the forward link.
Abstract:Accurate Vehicle Trajectory Prediction is critical for automated vehicles and advanced driver assistance systems. Vehicle trajectory prediction consists of two essential tasks, i.e., longitudinal position prediction and lateral position prediction. There is a significant correlation between driving intentions and vehicle motion. In existing work, the three tasks are often conducted separately without considering the relationships between the longitudinal position, lateral position, and driving intention. In this paper, we propose a novel Temporal Multi-Gate Mixture-of-Experts (TMMOE) model for simultaneously predicting the vehicle trajectory and driving intention. The proposed model consists of three layers: a shared layer, an expert layer, and a fully connected layer. In the model, the shared layer utilizes Temporal Convolutional Networks (TCN) to extract temporal features. Then the expert layer is built to identify different information according to the three tasks. Moreover, the fully connected layer is used to integrate and export prediction results. To achieve better performance, uncertainty algorithm is used to construct the multi-task loss function. Finally, the publicly available CitySim dataset validates the TMMOE model, demonstrating superior performance compared to the LSTM model, achieving the highest classification and regression results. Keywords: Vehicle trajectory prediction, driving intentions Classification, Multi-task
Abstract:Large Language Models (LLMs) have shown remarkable effectiveness in various general-domain natural language processing (NLP) tasks. However, their performance in transportation safety domain tasks has been suboptimal, primarily attributed to the requirement for specialized transportation safety expertise in generating accurate responses [1]. To address this challenge, we introduce TrafficSafetyGPT, a novel LLAMA-based model, which has undergone supervised fine-tuning using TrafficSafety-2K dataset which has human labels from government produced guiding books and ChatGPT-generated instruction-output pairs. Our proposed TrafficSafetyGPT model and TrafficSafety-2K train dataset are accessible at https://github.com/ozheng1993/TrafficSafetyGPT.
Abstract:The real-time crash likelihood prediction model is an essential component of the proactive traffic safety management system. Over the years, numerous studies have attempted to construct a crash likelihood prediction model in order to enhance traffic safety, but mostly on freeways. In the majority of the existing studies, researchers have primarily employed a deep learning-based framework to identify crash potential. Lately, Transformer has emerged as a potential deep neural network that fundamentally operates through attention-based mechanisms. Transformer has several functional benefits over extant deep learning models such as Long Short-Term Memory (LSTM), Convolution Neural Network (CNN), etc. Firstly, Transformer can readily handle long-term dependencies in a data sequence. Secondly, Transformers can parallelly process all elements in a data sequence during training. Finally, a Transformer does not have the vanishing gradient issue. Realizing the immense possibility of Transformers, this paper proposes inTersection-Transformer (inTformer), a time-embedded attention-based Transformer model that can effectively predict intersection crash likelihood in real-time. The proposed model was evaluated using connected vehicle data extracted from INRIX and Center for Advanced Transportation Technology (CATT) Lab's Signal Analytics Platform. The data was parallelly formatted and stacked at different timesteps to develop nine inTformer models. The best inTformer model achieved a sensitivity of 73%. This model was also compared to earlier studies on crash likelihood prediction at intersections and with several established deep learning models trained on the same connected vehicle dataset. In every scenario, this inTformer outperformed the benchmark models confirming the viability of the proposed inTformer architecture.
Abstract:Lane change (LC) is a continuous and complex operation process. Accurately detecting and predicting LC processes can help traffic participants better understand their surrounding environment, recognize potential LC safety hazards, and improve traffic safety. This present paper focuses on LC processes, developing an LC intention recognition (LC-IR) model and an LC status prediction (LC-SP) model. A novel ensemble temporal convolutional network with Long Short-Term Memory units (TCN-LSTM) is first proposed to capture long-range dependencies in sequential data. Then, three multi-task models (MTL-LSTM, MTL-TCN, MTL-TCN -LSTM) are developed to capture the intrinsic relationship among output indicators. Furthermore, a unified modeling framework for LC intention recognition and driving status prediction (LC-IR-SP) is developed. To validate the performance of the proposed models, a total number of 1023 vehicle trajectories is extracted from the CitySim dataset. The Pearson coefficient is employed to determine the related indicators. The results indicate that using150 frames as input length, the TCN-LSTM model with 96.67% accuracy outperforms TCN and LSTM models in LC intention classification and provides more balanced results for each class. Three proposed multi-tasking learning models provide markedly increased performance compared to corresponding single-task models, with an average reduction of 24.24% and 22.86% in the Mean Absolute Error (MAE) and Root Mean Square Error (RMSE), respectively. The developed LC-IR-SP model has promising applications for autonomous vehicles to identity lane change behaviors, calculate a real-time traffic conflict index and improve vehicle control strategies.