Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Runsheng Xu

Automated Driving Systems Data Acquisition and Processing Platform

Nov 24, 2022
Xin Xia, Zonglin Meng, Xu Han, Hanzhao Li, Takahiro Tsukiji, Runsheng Xu, Zhaoliang Zhang, Jiaqi Ma

Figure 1 for Automated Driving Systems Data Acquisition and Processing Platform

Figure 2 for Automated Driving Systems Data Acquisition and Processing Platform

Figure 3 for Automated Driving Systems Data Acquisition and Processing Platform

Figure 4 for Automated Driving Systems Data Acquisition and Processing Platform

This paper presents an automated driving system (ADS) data acquisition and processing platform for vehicle trajectory extraction, reconstruction, and evaluation based on connected automated vehicle (CAV) cooperative perception. This platform presents a holistic pipeline from the raw advanced sensory data collection to data processing, which can process the sensor data from multiple CAVs and extract the objects' Identity (ID) number, position, speed, and orientation information in the map and Frenet coordinates. First, the ADS data acquisition and analytics platform are presented. Specifically, the experimental CAVs platform and sensor configuration are shown, and the processing software, including a deep-learning-based object detection algorithm using LiDAR information, a late fusion scheme to leverage cooperative perception to fuse the detected objects from multiple CAVs, and a multi-object tracking method is introduced. To further enhance the object detection and tracking results, high definition maps consisting of point cloud and vector maps are generated and forwarded to a world model to filter out the objects off the road and extract the objects' coordinates in Frenet coordinates and the lane information. In addition, a post-processing method is proposed to refine trajectories from the object tracking algorithms. Aiming to tackle the ID switch issue of the object tracking algorithm, a fuzzy-logic-based approach is proposed to detect the discontinuous trajectories of the same object. Finally, results, including object detection and tracking and a late fusion scheme, are presented, and the post-processing algorithm's improvements in noise level and outlier removal are discussed, confirming the functionality and effectiveness of the proposed holistic data collection and processing platform.

Via

Access Paper or Ask Questions

Domain Adaptive Object Detection for Autonomous Driving under Foggy Weather

Oct 27, 2022
Jinlong Li, Runsheng Xu, Jin Ma, Qin Zou, Jiaqi Ma, Hongkai Yu

Figure 1 for Domain Adaptive Object Detection for Autonomous Driving under Foggy Weather

Figure 2 for Domain Adaptive Object Detection for Autonomous Driving under Foggy Weather

Figure 3 for Domain Adaptive Object Detection for Autonomous Driving under Foggy Weather

Figure 4 for Domain Adaptive Object Detection for Autonomous Driving under Foggy Weather

Most object detection methods for autonomous driving usually assume a consistent feature distribution between training and testing data, which is not always the case when weathers differ significantly. The object detection model trained under clear weather might not be effective enough in foggy weather because of the domain gap. This paper proposes a novel domain adaptive object detection framework for autonomous driving under foggy weather. Our method leverages both image-level and object-level adaptation to diminish the domain discrepancy in image style and object appearance. To further enhance the model's capabilities under challenging samples, we also come up with a new adversarial gradient reversal layer to perform adversarial mining for the hard examples together with domain adaptation. Moreover, we propose to generate an auxiliary domain by data augmentation to enforce a new domain-level metric regularization. Experimental results on public benchmarks show the effectiveness and accuracy of the proposed method. The code is available at https://github.com/jinlong17/DA-Detect.

* Accepted by WACV2023. Code is available at https://github.com/jinlong17/DA-Detect

Via

Access Paper or Ask Questions

Bridging the Domain Gap for Multi-Agent Perception

Oct 16, 2022
Runsheng Xu, Jinlong Li, Xiaoyu Dong, Hongkai Yu, Jiaqi Ma

Figure 1 for Bridging the Domain Gap for Multi-Agent Perception

Figure 2 for Bridging the Domain Gap for Multi-Agent Perception

Figure 3 for Bridging the Domain Gap for Multi-Agent Perception

Figure 4 for Bridging the Domain Gap for Multi-Agent Perception

Existing multi-agent perception algorithms usually select to share deep neural features extracted from raw sensing data between agents, achieving a trade-off between accuracy and communication bandwidth limit. However, these methods assume all agents have identical neural networks, which might not be practical in the real world. The transmitted features can have a large domain gap when the models differ, leading to a dramatic performance drop in multi-agent perception. In this paper, we propose the first lightweight framework to bridge such domain gaps for multi-agent perception, which can be a plug-in module for most existing systems while maintaining confidentiality. Our framework consists of a learnable feature resizer to align features in multiple dimensions and a sparse cross-domain transformer for domain adaption. Extensive experiments on the public multi-agent perception dataset V2XSet have demonstrated that our method can effectively bridge the gap for features from different domains and outperform other baseline methods significantly by at least 8% for point-cloud-based 3D object detection.

Via

Access Paper or Ask Questions

V2XP-ASG: Generating Adversarial Scenes for Vehicle-to-Everything Perception

Sep 27, 2022
Hao Xiang, Runsheng Xu, Xin Xia, Zhaoliang Zheng, Bolei Zhou, Jiaqi Ma

Figure 1 for V2XP-ASG: Generating Adversarial Scenes for Vehicle-to-Everything Perception

Figure 2 for V2XP-ASG: Generating Adversarial Scenes for Vehicle-to-Everything Perception

Figure 3 for V2XP-ASG: Generating Adversarial Scenes for Vehicle-to-Everything Perception

Figure 4 for V2XP-ASG: Generating Adversarial Scenes for Vehicle-to-Everything Perception

Recent advancements in Vehicle-to-Everything communication technology have enabled autonomous vehicles to share sensory information to obtain better perception performance. With the rapid growth of autonomous vehicles and intelligent infrastructure, the V2X perception systems will soon be deployed at scale, which raises a safety-critical question: how can we evaluate and improve its performance under challenging traffic scenarios before the real-world deployment? Collecting diverse large-scale real-world test scenes seems to be the most straightforward solution, but it is expensive and time-consuming, and the collections can only cover limited scenarios. To this end, we propose the first open adversarial scene generator V2XP-ASG that can produce realistic, challenging scenes for modern LiDAR-based multi-agent perception system. V2XP-ASG learns to construct an adversarial collaboration graph and simultaneously perturb multiple agents' poses in an adversarial and plausible manner. The experiments demonstrate that V2XP-ASG can effectively identify challenging scenes for a large range of V2X perception systems. Meanwhile, by training on the limited number of generated challenging scenes, the accuracy of V2X perception systems can be further improved by 12.3% on challenging and 4% on normal scenes.

Via

Access Paper or Ask Questions

Diffusion Models: A Comprehensive Survey of Methods and Applications

Sep 15, 2022
Ling Yang, Zhilong Zhang, Shenda Hong, Runsheng Xu, Yue Zhao, Yingxia Shao, Wentao Zhang, Ming-Hsuan Yang, Bin Cui

Figure 1 for Diffusion Models: A Comprehensive Survey of Methods and Applications

Figure 2 for Diffusion Models: A Comprehensive Survey of Methods and Applications

Figure 3 for Diffusion Models: A Comprehensive Survey of Methods and Applications

Figure 4 for Diffusion Models: A Comprehensive Survey of Methods and Applications

Diffusion models are a class of deep generative models that have shown impressive results on various tasks with a solid theoretical foundation. Despite demonstrated success than state-of-the-art approaches, diffusion models often entail costly sampling procedures and sub-optimal likelihood estimation. Significant efforts have been made to improve the performance of diffusion models in various aspects. In this article, we present a comprehensive review of existing variants of diffusion models. Specifically, we provide the taxonomy of diffusion models and categorize them into three types: sampling-acceleration enhancement, likelihood-maximization enhancement, and data-generalization enhancement. We also introduce the other generative models (i.e., variational autoencoders, generative adversarial networks, normalizing flow, autoregressive models, and energy-based models) and discuss the connections between diffusion models and these generative models. Then we review the applications of diffusion models, including computer vision, natural language processing, waveform signal processing, multi-modal modeling, molecular graph generation, time series modeling, and adversarial purification. Furthermore, we propose new perspectives pertaining to the development of generative models. Github: https://github.com/YangLing0818/Diffusion-Models-Papers-Survey-Taxonomy.

* 33 pages, citing 255 papers, project: https://github.com/YangLing0818/Diffusion-Models-Papers-Survey-Taxonomy

Via

Access Paper or Ask Questions

CoBEVT: Cooperative Bird's Eye View Semantic Segmentation with Sparse Transformers

Jul 05, 2022
Runsheng Xu, Zhengzhong Tu, Hao Xiang, Wei Shao, Bolei Zhou, Jiaqi Ma

Figure 1 for CoBEVT: Cooperative Bird's Eye View Semantic Segmentation with Sparse Transformers

Figure 2 for CoBEVT: Cooperative Bird's Eye View Semantic Segmentation with Sparse Transformers

Figure 3 for CoBEVT: Cooperative Bird's Eye View Semantic Segmentation with Sparse Transformers

Figure 4 for CoBEVT: Cooperative Bird's Eye View Semantic Segmentation with Sparse Transformers

Bird's eye view (BEV) semantic segmentation plays a crucial role in spatial sensing for autonomous driving. Although recent literature has made significant progress on BEV map understanding, they are all based on single-agent camera-based systems which are difficult to handle occlusions and detect distant objects in complex traffic scenes. Vehicle-to-Vehicle (V2V) communication technologies have enabled autonomous vehicles to share sensing information, which can dramatically improve the perception performance and range as compared to single-agent systems. In this paper, we propose CoBEVT, the first generic multi-agent multi-camera perception framework that can cooperatively generate BEV map predictions. To efficiently fuse camera features from multi-view and multi-agent data in an underlying Transformer architecture, we design a fused axial attention or FAX module, which can capture sparsely local and global spatial interactions across views and agents. The extensive experiments on the V2V perception dataset, OPV2V, demonstrate that CoBEVT achieves state-of-the-art performance for cooperative BEV semantic segmentation. Moreover, CoBEVT is shown to be generalizable to other tasks, including 1) BEV segmentation with single-agent multi-camera and 2) 3D object detection with multi-agent LiDAR systems, and achieves state-of-the-art performance with real-time inference speed.

Via

Access Paper or Ask Questions

Pik-Fix: Restoring and Colorizing Old Photos

May 11, 2022
Runsheng Xu, Zhengzhong Tu, Yuanqi Du, Xiaoyu Dong, Jinlong Li, Zibo Meng, Jiaqi Ma, Alan Bovik, Hongkai Yu

Figure 1 for Pik-Fix: Restoring and Colorizing Old Photos

Figure 2 for Pik-Fix: Restoring and Colorizing Old Photos

Figure 3 for Pik-Fix: Restoring and Colorizing Old Photos

Figure 4 for Pik-Fix: Restoring and Colorizing Old Photos

Restoring and inpainting the visual memories that are present, but often impaired, in old photos remains an intriguing but unsolved research topic. Decades-old photos often suffer from severe and commingled degradation such as cracks, defocus, and color-fading, which are difficult to treat individually and harder to repair when they interact. Deep learning presents a plausible avenue, but the lack of large-scale datasets of old photos makes addressing this restoration task very challenging. Here we present a novel reference-based end-to-end learning framework that is able to both repair and colorize old and degraded pictures. Our proposed framework consists of three modules: a restoration sub-network that conducts restoration from degradations, a similarity sub-network that performs color histogram matching and color transfer, and a colorization subnet that learns to predict the chroma elements of images that have been conditioned on chromatic reference signals. The overall system makes use of color histogram priors from reference images, which greatly reduces the need for large-scale training data. We have also created a first-of-a-kind public dataset of real old photos that are paired with ground truth "pristine" photos that have been that have been manually restored by PhotoShop experts. We conducted extensive experiments on this dataset and synthetic datasets, and found that our method significantly outperforms previous state-of-the-art models using both qualitative comparisons and quantitative measurements.

* arXiv admin note: text overlap with arXiv:2202.02606

Via

Access Paper or Ask Questions

Pik-Fix: Restoring and Colorizing Old Photo

May 04, 2022
Runsheng Xu, Zhengzhong Tu, Yuanqi Du, Xiaoyu Dong, Jinlong Li, Zibo Meng, Jiaqi Ma, Alan Bovik, Hongkai Yu

Figure 1 for Pik-Fix: Restoring and Colorizing Old Photo

Figure 2 for Pik-Fix: Restoring and Colorizing Old Photo

Figure 3 for Pik-Fix: Restoring and Colorizing Old Photo

Figure 4 for Pik-Fix: Restoring and Colorizing Old Photo

Restoring and inpainting the visual memories that are present, but often impaired, in old photos remains an intriguing but unsolved research topic. Decades-old photos often suffer from severe and commingled degradation such as cracks, defocus, and color-fading, which are difficult to treat individually and harder to repair when they interact. Deep learning presents a plausible avenue, but the lack of large-scale datasets of old photos makes addressing this restoration task very challenging. Here we present a novel reference-based end-to-end learning framework that is able to both repair and colorize old and degraded pictures. Our proposed framework consists of three modules: a restoration sub-network that conducts restoration from degradations, a similarity sub-network that performs color histogram matching and color transfer, and a colorization subnet that learns to predict the chroma elements of images that have been conditioned on chromatic reference signals. The overall system makes uses of color histogram priors from reference images, which greatly reduces the need for large-scale training data. We have also created a first-of-a-kind public dataset of real old photos that are paired with ground truth "pristine" photos that have been that have been manually restored by PhotoShop experts. We conducted extensive experiments on this dataset and synthetic datasets, and found that our method significantly outperforms previous state-of-the-art models using both qualitative comparisons and quantitative measurements.

* arXiv admin note: text overlap with arXiv:2202.02606

Via

Access Paper or Ask Questions

Model-Agnostic Multi-Agent Perception Framework

Mar 24, 2022
Weizhe Chen, Runsheng Xu, Hao Xiang, Lantao Liu, Jiaqi Ma

Figure 1 for Model-Agnostic Multi-Agent Perception Framework

Figure 2 for Model-Agnostic Multi-Agent Perception Framework

Figure 3 for Model-Agnostic Multi-Agent Perception Framework

Figure 4 for Model-Agnostic Multi-Agent Perception Framework

Existing multi-agent perception systems assume that every agent utilizes the same models with identical parameters and architecture, which is often impractical in the real world. The significant performance boost brought by the multi-agent system can be degraded dramatically when the perception models are noticeably different. In this work, we propose a model-agnostic multi-agent framework to reduce the negative effect caused by model discrepancies and maintain confidentiality. Specifically, we consider the perception heterogeneity between agents by integrating a novel uncertainty calibrator which can eliminate the bias among agents' predicted confidence scores. Each agent performs such calibration independently on a standard public database, and therefore the intellectual property can be protected. To further refine the detection accuracy, we also propose a new algorithm called Promotion-Suppression Aggregation (PSA) that considers not only the confidence score of proposals but also the spatial agreement of their neighbors. Our experiments emphasize the necessity of model calibration across different agents, and the results show that our proposed approach outperforms the state-of-the-art baseline methods for 3D object detection on the open OPV2V dataset.

Via

Access Paper or Ask Questions

V2X-ViT: Vehicle-to-Everything Cooperative Perception with Vision Transformer

Mar 20, 2022
Runsheng Xu, Hao Xiang, Zhengzhong Tu, Xin Xia, Ming-Hsuan Yang, Jiaqi Ma

Figure 1 for V2X-ViT: Vehicle-to-Everything Cooperative Perception with Vision Transformer

Figure 2 for V2X-ViT: Vehicle-to-Everything Cooperative Perception with Vision Transformer

Figure 3 for V2X-ViT: Vehicle-to-Everything Cooperative Perception with Vision Transformer

Figure 4 for V2X-ViT: Vehicle-to-Everything Cooperative Perception with Vision Transformer

In this paper, we investigate the application of Vehicle-to-Everything (V2X) communication to improve the perception performance of autonomous vehicles. We present a robust cooperative perception framework with V2X communication using a novel vision Transformer. Specifically, we build a holistic attention model, namely V2X-ViT, to effectively fuse information across on-road agents (i.e., vehicles and infrastructure). V2X-ViT consists of alternating layers of heterogeneous multi-agent self-attention and multi-scale window self-attention, which captures inter-agent interaction and per-agent spatial relationships. These key modules are designed in a unified Transformer architecture to handle common V2X challenges, including asynchronous information sharing, pose errors, and heterogeneity of V2X components. To validate our approach, we create a large-scale V2X perception dataset using CARLA and OpenCDA. Extensive experimental results demonstrate that V2X-ViT sets new state-of-the-art performance for 3D object detection and achieves robust performance even under harsh, noisy environments. The dataset, source code, and trained models will be open-sourced.

Via

Access Paper or Ask Questions