With the advent of universal function approximators in the domain of reinforcement learning, the number of practical applications leveraging deep reinforcement learning (DRL) has exploded. Decision-making in automated driving tasks has emerged as a chief application among them, taking the sensor data or the higher-order kinematic variables as the input and providing a discrete choice or continuous control output. However, the black-box nature of the models presents an overwhelming limitation that restricts the real-world deployment of DRL in autonomous vehicles (AVs). Therefore, in this research work, we focus on the interpretability of an attention-based DRL framework. We use a continuous proximal policy optimization-based DRL algorithm as the baseline model and add a multi-head attention framework in an open-source AV simulation environment. We provide some analytical techniques for discussing the interpretability of the trained models in terms of explainability and causality for spatial and temporal correlations. We show that the weights in the first head encode the positions of the neighboring vehicles while the second head focuses on the leader vehicle exclusively. Also, the ego vehicle's action is causally dependent on the vehicles in the target lane spatially and temporally. Through these findings, we reliably show that these techniques can help practitioners decipher the results of the DRL algorithms.
To address the challenges of low detection accuracy and high false positive rates of transmission lines in UAV (Unmanned Aerial Vehicle) images, we explore the linear features and spatial distribution. We introduce an enhanced stochastic Hough transform technique tailored for detecting transmission lines in complex backgrounds. By employing the Hessian matrix for initial preprocessing of transmission lines, and utilizing boundary search and pixel row segmentation, our approach distinguishes transmission line areas from the background. We significantly reduce both false positives and missed detections, thereby improving the accuracy of transmission line identification. Experiments demonstrate that our method not only processes images more rapidly, but also yields superior detection results compared to conventional and random Hough transform methods.
Intelligent vehicle anticipation of the movement intentions of other drivers can reduce collisions. Typically, when a human driver of another vehicle (referred to as the target vehicle) engages in specific behaviors such as checking the rearview mirror prior to lane change, a valuable clue is therein provided on the intentions of the target vehicle's driver. Furthermore, the target driver's intentions can be influenced and shaped by their driving environment. For example, if the target vehicle is too close to a leading vehicle, it may renege the lane change decision. On the other hand, a following vehicle in the target lane is too close to the target vehicle could lead to its reversal of the decision to change lanes. Knowledge of such intentions of all vehicles in a traffic stream can help enhance traffic safety. Unfortunately, such information is often captured in the form of images/videos. Utilization of personally identifiable data to train a general model could violate user privacy. Federated Learning (FL) is a promising tool to resolve this conundrum. FL efficiently trains models without exposing the underlying data. This paper introduces a Personalized Federated Learning (PFL) model embedded a long short-term transformer (LSTR) framework. The framework predicts drivers' intentions by leveraging in-vehicle videos (of driver movement, gestures, and expressions) and out-of-vehicle videos (of the vehicle's surroundings - frontal/rear areas). The proposed PFL-LSTR framework is trained and tested through real-world driving data collected from human drivers at Interstate 65 in Indiana. The results suggest that the PFL-LSTR exhibits high adaptability and high precision, and that out-of-vehicle information (particularly, the driver's rear-mirror viewing actions) is important because it helps reduce false positives and thereby enhances the precision of driver intention inference.
With the development and progress of science and technology, the Internet of Things(IoT) has gradually entered people's lives, bringing great convenience to our lives and improving people's work efficiency. Specifically, the IoT can replace humans in jobs that they cannot perform. As a new type of IoT vehicle, the current status and trend of research on Unmanned Aerial Vehicle(UAV) is gratifying, and the development prospect is very promising. However, privacy and communication are still very serious issues in drone applications. This is because most drones still use centralized cloud-based data processing, which may lead to leakage of data collected by drones. At the same time, the large amount of data collected by drones may incur greater communication overhead when transferred to the cloud. Federated learning as a means of privacy protection can effectively solve the above two problems. However, federated learning when applied to UAV networks also needs to consider the heterogeneity of data, which is caused by regional differences in UAV regulation. In response, this paper proposes a new algorithm FedBA to optimize the global model and solves the data heterogeneity problem. In addition, we apply the algorithm to some real datasets, and the experimental results show that the algorithm outperforms other algorithms and improves the accuracy of the local model for UAVs.
Pedestrian safety has become an important research topic among various studies due to the increased number of pedestrian-involved crashes. To evaluate pedestrian safety proactively, surrogate safety measures (SSMs) have been widely used in traffic conflict-based studies as they do not require historical crashes as inputs. However, most existing SSMs were developed based on the assumption that road users would maintain constant velocity and direction. Risk estimations based on this assumption are less unstable, more likely to be exaggerated, and unable to capture the evasive maneuvers of drivers. Considering the limitations among existing SSMs, this study proposes a probabilistic framework for estimating the risk of pedestrian-vehicle conflicts at intersections. The proposed framework loosen restrictions of constant speed by predicting trajectories using a Gaussian Process Regression and accounts for the different possible driver maneuvers with a Random Forest model. Real-world LiDAR data collected at an intersection was used to evaluate the performance of the proposed framework. The newly developed framework is able to identify all pedestrian-vehicle conflicts. Compared to the Time-to-Collision, the proposed framework provides a more stable risk estimation and captures the evasive maneuvers of vehicles. Moreover, the proposed framework does not require expensive computation resources, which makes it an ideal choice for real-time proactive pedestrian safety solutions at intersections.
In this paper, we jointly design the power control and position dispatch for Multi-unmanned aerial vehicle (UAV)-enabled communication in device-to-device (D2D) networks. Our objective is to maximize the total transmission rate of downlink users (DUs). Meanwhile, the quality of service (QoS) of all D2D users must be satisfied. We comprehensively considered the interference among D2D communications and downlink transmissions. The original problem is strongly non-convex, which requires high computational complexity for traditional optimization methods. And to make matters worse, the results are not necessarily globally optimal. In this paper, we propose a novel graph neural networks (GNN) based approach that can map the considered system into a specific graph structure and achieve the optimal solution in a low complexity manner. Particularly, we first construct a GNN-based model for the proposed network, in which the transmission links and interference links are formulated as vertexes and edges, respectively. Then, by taking the channel state information and the coordinates of ground users as the inputs, as well as the location of UAVs and the transmission power of all transmitters as outputs, we obtain the mapping from inputs to outputs through training the parameters of GNN. Simulation results verified that the way to maximize the total transmission rate of DUs can be extracted effectively via the training on samples. Moreover, it also shows that the performance of proposed GNN-based method is better than that of traditional means.
Graph signals arise in various applications, ranging from sensor networks to social media data. The high-dimensional nature of these signals implies that they often need to be compressed in order to be stored and transmitted. The common framework for graph signal compression is based on sampling, resulting in a set of continuous-amplitude samples, which in turn have to be quantized into a finite bit representation. In this work we study the joint design of graph signal sampling along with quantization, for graph signal compression. We focus on bandlimited graph signals, and show that the compression problem can be represented as a task-based quantization setup, in which the task is to recover the spectrum of the signal. Based on this equivalence, we propose a joint design of the sampling and recovery mechanisms for a fixed quantization mapping, and present an iterative algorithm for dividing the available bit budget among the discretized samples. Furthermore, we show how the proposed approach can be realized using graph filters combining elements corresponding the neighbouring nodes of the graph, thus facilitating distributed implementation at reduced complexity. Our numerical evaluations on both synthetic and real world data shows that the joint sampling and quantization method yields a compact finite bit representation of high-dimensional graph signals, which allows reconstruction of the original signal with accuracy within a small gap of that achievable with infinite resolution quantizers.
Traffic simulation is an efficient and cost-effective way to test Autonomous Vehicles (AVs) in a complex and dynamic environment. Numerous studies have been conducted for AV evaluation using traffic simulation over the past decades. However, the current simulation environments fall behind on two fronts -- the background vehicles (BVs) fail to simulate naturalistic driving behavior and the existing environments do not test the entire pipeline in a modular fashion. This study aims to propose a simulation framework that creates a complex and naturalistic traffic environment. Specifically, we combine a modified version of the Simulation of Urban MObility (SUMO) simulator with the Cars Learning to Act (CARLA) simulator to generate a simulation environment that could emulate the complexities of the external environment while providing realistic sensor outputs to the AV pipeline. In a past research work, we created an open-source Python package called SUMO-Gym which generates a realistic road network and naturalistic traffic through SUMO and combines that with OpenAI Gym to provide ease of use for the end user. We propose to extend our developed software by adding CARLA, which in turn will enrich the perception of the ego vehicle by providing realistic sensors outputs of the AVs surrounding environment. Using the proposed framework, AVs perception, planning, and control could be tested in a complex and realistic driving environment. The performance of the proposed framework in constructing output generation and AV evaluations are demonstrated using several case studies.
Current autonomous vehicle (AV) simulators are built to provide large-scale testing required to prove capabilities under varied conditions in controlled, repeatable fashion. However, they have certain failings including the need for user expertise and complex inconvenient tutorials for customized scenario creation. Simulation of Urban Mobility (SUMO) simulator, which has been presented as an open-source AV simulator, is used extensively but suffer from similar issues which make it difficult for entry-level practitioners to utilize the simulator without significant time investment. In that regard, we provide two enhancements to SUMO simulator geared towards massively improving user experience and providing real-life like variability for surrounding traffic. Firstly, we calibrate a car-following model, Intelligent Driver Model (IDM), for highway and urban naturalistic driving data and sample automatically from the parameter distributions to create the background vehicles. Secondly, we combine SUMO with OpenAI gym, creating a Python package which can run simulations based on real world highway and urban layouts with generic output observations and input actions that can be processed via any AV pipeline. Our aim through these enhancements is to provide an easy-to-use platform which can be readily used for AV testing and validation.
The subtleties of human perception, as measured by vision scientists through the use of psychophysics, are important clues to the internal workings of visual recognition. For instance, measured reaction time can indicate whether a visual stimulus is easy for a subject to recognize, or whether it is hard. In this paper, we consider how to incorporate psychophysical measurements of visual perception into the loss function of a deep neural network being trained for a recognition task, under the assumption that such information can enforce consistency with human behavior. As a case study to assess the viability of this approach, we look at the problem of handwritten document transcription. While good progress has been made towards automatically transcribing modern handwriting, significant challenges remain in transcribing historical documents. Here we work towards a comprehensive transcription solution for Medieval manuscripts that combines networks trained using our novel loss formulation with natural language processing elements. In a baseline assessment, reliable performance is demonstrated for the standard IAM and RIMES datasets. Further, we go on to show feasibility for our approach on a previously published dataset and a new dataset of digitized Latin manuscripts, originally produced by scribes in the Cloister of St. Gall around the middle of the 9th century.