Alert button
Picture for Ziran Wang

Ziran Wang

Alert button

Drive as You Speak: Enabling Human-Like Interaction with Large Language Models in Autonomous Vehicles

Sep 19, 2023
Can Cui, Yunsheng Ma, Xu Cao, Wenqian Ye, Ziran Wang

The future of autonomous vehicles lies in the convergence of human-centric design and advanced AI capabilities. Autonomous vehicles of the future will not only transport passengers but also interact and adapt to their desires, making the journey comfortable, efficient, and pleasant. In this paper, we present a novel framework that leverages Large Language Models (LLMs) to enhance autonomous vehicles' decision-making processes. By integrating LLMs' natural language capabilities and contextual understanding, specialized tools usage, synergizing reasoning, and acting with various modules on autonomous vehicles, this framework aims to seamlessly integrate the advanced language and reasoning capabilities of LLMs into autonomous vehicles. The proposed framework holds the potential to revolutionize the way autonomous vehicles operate, offering personalized assistance, continuous learning, and transparent decision-making, ultimately contributing to safer and more efficient autonomous driving technologies.

Viaarxiv icon

Federated Learning for Connected and Automated Vehicles: A Survey of Existing Approaches and Challenges

Aug 21, 2023
Vishnu Pandi Chellapandi, Liangqi Yuan, Christopher G. Brinton, Stanislaw H Zak, Ziran Wang

Figure 1 for Federated Learning for Connected and Automated Vehicles: A Survey of Existing Approaches and Challenges
Figure 2 for Federated Learning for Connected and Automated Vehicles: A Survey of Existing Approaches and Challenges
Figure 3 for Federated Learning for Connected and Automated Vehicles: A Survey of Existing Approaches and Challenges
Figure 4 for Federated Learning for Connected and Automated Vehicles: A Survey of Existing Approaches and Challenges

Machine learning (ML) is widely used for key tasks in Connected and Automated Vehicles (CAV), including perception, planning, and control. However, its reliance on vehicular data for model training presents significant challenges related to in-vehicle user privacy and communication overhead generated by massive data volumes. Federated learning (FL) is a decentralized ML approach that enables multiple vehicles to collaboratively develop models, broadening learning from various driving environments, enhancing overall performance, and simultaneously securing local vehicle data privacy and security. This survey paper presents a review of the advancements made in the application of FL for CAV (FL4CAV). First, centralized and decentralized frameworks of FL are analyzed, highlighting their key characteristics and methodologies. Second, diverse data sources, models, and data security techniques relevant to FL in CAVs are reviewed, emphasizing their significance in ensuring privacy and confidentiality. Third, specific and important applications of FL are explored, providing insight into the base models and datasets employed for each application. Finally, existing challenges for FL4CAV are listed and potential directions for future work are discussed to further enhance the effectiveness and efficiency of FL in the context of CAV.

Viaarxiv icon

Decentralized Federated Learning: A Survey and Perspective

Jun 02, 2023
Liangqi Yuan, Lichao Sun, Philip S. Yu, Ziran Wang

Figure 1 for Decentralized Federated Learning: A Survey and Perspective
Figure 2 for Decentralized Federated Learning: A Survey and Perspective
Figure 3 for Decentralized Federated Learning: A Survey and Perspective
Figure 4 for Decentralized Federated Learning: A Survey and Perspective

Federated learning (FL) has been gaining attention for its ability to share knowledge while maintaining user data, protecting privacy, increasing learning efficiency, and reducing communication overhead. Decentralized FL (DFL) is a decentralized network architecture that eliminates the need for a central server in contrast to centralized FL (CFL). DFL enables direct communication between clients, resulting in significant savings in communication resources. In this paper, a comprehensive survey and profound perspective is provided for DFL. First, a review of the methodology, challenges, and variants of CFL is conducted, laying the background of DFL. Then, a systematic and detailed perspective on DFL is introduced, including iteration order, communication protocols, network topologies, paradigm proposals, and temporal variability. Next, based on the definition of DFL, several extended variants and categorizations are proposed with state-of-the-art technologies. Lastly, in addition to summarizing the current challenges in the DFL, some possible solutions and future research directions are also discussed.

Viaarxiv icon

Radar Enlighten the Dark: Enhancing Low-Visibility Perception for Automated Vehicles with Camera-Radar Fusion

May 27, 2023
Can Cui, Yunsheng Ma, Juanwu Lu, Ziran Wang

Figure 1 for Radar Enlighten the Dark: Enhancing Low-Visibility Perception for Automated Vehicles with Camera-Radar Fusion
Figure 2 for Radar Enlighten the Dark: Enhancing Low-Visibility Perception for Automated Vehicles with Camera-Radar Fusion
Figure 3 for Radar Enlighten the Dark: Enhancing Low-Visibility Perception for Automated Vehicles with Camera-Radar Fusion
Figure 4 for Radar Enlighten the Dark: Enhancing Low-Visibility Perception for Automated Vehicles with Camera-Radar Fusion

Sensor fusion is a crucial augmentation technique for improving the accuracy and reliability of perception systems for automated vehicles under diverse driving conditions. However, adverse weather and low-light conditions remain challenging, where sensor performance degrades significantly, exposing vehicle safety to potential risks. Advanced sensors such as LiDARs can help mitigate the issue but with extremely high marginal costs. In this paper, we propose a novel transformer-based 3D object detection model "REDFormer" to tackle low visibility conditions, exploiting the power of a more practical and cost-effective solution by leveraging bird's-eye-view camera-radar fusion. Using the nuScenes dataset with multi-radar point clouds, weather information, and time-of-day data, our model outperforms state-of-the-art (SOTA) models on classification and detection accuracy. Finally, we provide extensive ablation studies of each model component on their contributions to address the above-mentioned challenges. Particularly, it is shown in the experiments that our model achieves a significant performance improvement over the baseline model in low-visibility scenarios, specifically exhibiting a 31.31% increase in rainy scenes and a 46.99% enhancement in nighttime scenes.The source code of this study is publicly available.

Viaarxiv icon

CEMFormer: Learning to Predict Driver Intentions from In-Cabin and External Cameras via Spatial-Temporal Transformers

May 13, 2023
Yunsheng Ma, Wenqian Ye, Xu Cao, Amr Abdelraouf, Kyungtae Han, Rohit Gupta, Ziran Wang

Figure 1 for CEMFormer: Learning to Predict Driver Intentions from In-Cabin and External Cameras via Spatial-Temporal Transformers
Figure 2 for CEMFormer: Learning to Predict Driver Intentions from In-Cabin and External Cameras via Spatial-Temporal Transformers
Figure 3 for CEMFormer: Learning to Predict Driver Intentions from In-Cabin and External Cameras via Spatial-Temporal Transformers
Figure 4 for CEMFormer: Learning to Predict Driver Intentions from In-Cabin and External Cameras via Spatial-Temporal Transformers

Driver intention prediction seeks to anticipate drivers' actions by analyzing their behaviors with respect to surrounding traffic environments. Existing approaches primarily focus on late-fusion techniques, and neglect the importance of maintaining consistency between predictions and prevailing driving contexts. In this paper, we introduce a new framework called Cross-View Episodic Memory Transformer (CEMFormer), which employs spatio-temporal transformers to learn unified memory representations for an improved driver intention prediction. Specifically, we develop a spatial-temporal encoder to integrate information from both in-cabin and external camera views, along with episodic memory representations to continuously fuse historical data. Furthermore, we propose a novel context-consistency loss that incorporates driving context as an auxiliary supervision signal to improve prediction performance. Comprehensive experiments on the Brain4Cars dataset demonstrate that CEMFormer consistently outperforms existing state-of-the-art methods in driver intention prediction.

Viaarxiv icon

M$^2$DAR: Multi-View Multi-Scale Driver Action Recognition with Vision Transformer

May 13, 2023
Yunsheng Ma, Liangqi Yuan, Amr Abdelraouf, Kyungtae Han, Rohit Gupta, Zihao Li, Ziran Wang

Figure 1 for M$^2$DAR: Multi-View Multi-Scale Driver Action Recognition with Vision Transformer
Figure 2 for M$^2$DAR: Multi-View Multi-Scale Driver Action Recognition with Vision Transformer
Figure 3 for M$^2$DAR: Multi-View Multi-Scale Driver Action Recognition with Vision Transformer
Figure 4 for M$^2$DAR: Multi-View Multi-Scale Driver Action Recognition with Vision Transformer

Ensuring traffic safety and preventing accidents is a critical goal in daily driving, where the advancement of computer vision technologies can be leveraged to achieve this goal. In this paper, we present a multi-view, multi-scale framework for naturalistic driving action recognition and localization in untrimmed videos, namely M$^2$DAR, with a particular focus on detecting distracted driving behaviors. Our system features a weight-sharing, multi-scale Transformer-based action recognition network that learns robust hierarchical representations. Furthermore, we propose a new election algorithm consisting of aggregation, filtering, merging, and selection processes to refine the preliminary results from the action recognition module across multiple views. Extensive experiments conducted on the 7th AI City Challenge Track 3 dataset demonstrate the effectiveness of our approach, where we achieved an overlap score of 0.5921 on the A2 test set. Our source code is available at \url{https://github.com/PurdueDigitalTwin/M2DAR}.

* Accepted in the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) 
Viaarxiv icon

Peer-to-Peer Federated Continual Learning for Naturalistic Driving Action Recognition

Apr 14, 2023
Liangqi Yuan, Yunsheng Ma, Lu Su, Ziran Wang

Figure 1 for Peer-to-Peer Federated Continual Learning for Naturalistic Driving Action Recognition
Figure 2 for Peer-to-Peer Federated Continual Learning for Naturalistic Driving Action Recognition
Figure 3 for Peer-to-Peer Federated Continual Learning for Naturalistic Driving Action Recognition
Figure 4 for Peer-to-Peer Federated Continual Learning for Naturalistic Driving Action Recognition

Naturalistic driving action recognition (NDAR) has proven to be an effective method for detecting driver distraction and reducing the risk of traffic accidents. However, the intrusive design of in-cabin cameras raises concerns about driver privacy. To address this issue, we propose a novel peer-to-peer (P2P) federated learning (FL) framework with continual learning, namely FedPC, which ensures privacy and enhances learning efficiency while reducing communication, computational, and storage overheads. Our framework focuses on addressing the clients' objectives within a serverless FL framework, with the goal of delivering personalized and accurate NDAR models. We demonstrate and evaluate the performance of FedPC on two real-world NDAR datasets, including the State Farm Distracted Driver Detection and Track 3 NDAR dataset in the 2023 AICity Challenge. The results of our experiments highlight the strong competitiveness of FedPC compared to the conventional client-to-server (C2S) FLs in terms of performance, knowledge dissemination rate, and compatibility with new clients.

* CVPRW 2023 
Viaarxiv icon

A Survey of Federated Learning for Connected and Automated Vehicles

Mar 19, 2023
Vishnu Pandi Chellapandi, Liangqi Yuan, Stanislaw H /. Zak, Ziran Wang

Figure 1 for A Survey of Federated Learning for Connected and Automated Vehicles
Figure 2 for A Survey of Federated Learning for Connected and Automated Vehicles
Figure 3 for A Survey of Federated Learning for Connected and Automated Vehicles

Connected and Automated Vehicles (CAVs) are one of the emerging technologies in the automotive domain that has the potential to alleviate the issues of accidents, traffic congestion, and pollutant emissions, leading to a safe, efficient, and sustainable transportation system. Machine learning-based methods are widely used in CAVs for crucial tasks like perception, motion planning, and motion control, where machine learning models in CAVs are solely trained using the local vehicle data, and the performance is not certain when exposed to new environments or unseen conditions. Federated learning (FL) is an effective solution for CAVs that enables a collaborative model development with multiple vehicles in a distributed learning framework. FL enables CAVs to learn from a wide range of driving environments and improve their overall performance while ensuring the privacy and security of local vehicle data. In this paper, we review the progress accomplished by researchers in applying FL to CAVs. A broader view of the various data modalities and algorithms that have been implemented on CAVs is provided. Specific applications of FL are reviewed in detail, and an analysis of the challenges and future scope of research are presented.

* 8 pages, 1 figure 
Viaarxiv icon

Metamobility: Connecting Future Mobility with Metaverse

Jan 17, 2023
Haoxin Wang, Ziran Wang, Dawei Chen, Qiang Liu, Hongyu Ke, Kyungtae Han

Figure 1 for Metamobility: Connecting Future Mobility with Metaverse
Figure 2 for Metamobility: Connecting Future Mobility with Metaverse
Figure 3 for Metamobility: Connecting Future Mobility with Metaverse
Figure 4 for Metamobility: Connecting Future Mobility with Metaverse

A Metaverse is a perpetual, immersive, and shared digital universe that is linked to but beyond the physical reality, and this emerging technology is attracting enormous attention from different industries. In this article, we define the first holistic realization of the metaverse in the mobility domain, coined as ``metamobility". We present our vision of what metamobility will be and describe its basic architecture. We also propose two use cases, tactile live maps and meta-empowered advanced driver-assistance systems (ADAS), to demonstrate how the metamobility will benefit and reshape future mobility systems. Each use case is discussed from the perspective of the technology evolution, future vision, and critical research challenges, respectively. Finally, we identify multiple concrete open research issues for the development and deployment of the metamobility.

Viaarxiv icon

Federated Transfer-Ordered-Personalized Learning for Driver Monitoring Application

Jan 12, 2023
Liangqi Yuan, Lu Su, Ziran Wang

Figure 1 for Federated Transfer-Ordered-Personalized Learning for Driver Monitoring Application
Figure 2 for Federated Transfer-Ordered-Personalized Learning for Driver Monitoring Application
Figure 3 for Federated Transfer-Ordered-Personalized Learning for Driver Monitoring Application
Figure 4 for Federated Transfer-Ordered-Personalized Learning for Driver Monitoring Application

Federated learning (FL) shines through in the internet of things (IoT) with its ability to realize collaborative learning and improve learning efficiency by sharing client model parameters trained on local data. Although FL has been successfully applied to various domains, including driver monitoring application (DMA) on the internet of vehicles (IoV), its usages still face some open issues, such as data and system heterogeneity, large-scale parallelism communication resources, malicious attacks, and data poisoning. This paper proposes a federated transfer-ordered-personalized learning (FedTOP) framework to address the above problems and test on two real-world datasets with and without system heterogeneity. The performance of the three extensions, transfer, ordered, and personalized, is compared by an ablation study and achieves 92.32% and 95.96% accuracy on the test clients of two datasets, respectively. Compared to the baseline, there is a 462% improvement in accuracy and a 37.46% reduction in communication resource consumption. The results demonstrate that the proposed FedTOP can be used as a highly accurate, streamlined, privacy-preserving, cybersecurity-oriented, personalized framework for DMA.

Viaarxiv icon