Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jun Li

Michael

FMRT: Learning Accurate Feature Matching with Reconciliatory Transformer

Oct 20, 2023

Xinyu Zhang, Li Wang, Zhiqiang Jiang, Kun Dai, Tao Xie, Lei Yang, Wenhao Yu, Yang Shen, Jun Li

Abstract:Local Feature Matching, an essential component of several computer vision tasks (e.g., structure from motion and visual localization), has been effectively settled by Transformer-based methods. However, these methods only integrate long-range context information among keypoints with a fixed receptive field, which constrains the network from reconciling the importance of features with different receptive fields to realize complete image perception, hence limiting the matching accuracy. In addition, these methods utilize a conventional handcrafted encoding approach to integrate the positional information of keypoints into the visual descriptors, which limits the capability of the network to extract reliable positional encoding message. In this study, we propose Feature Matching with Reconciliatory Transformer (FMRT), a novel Transformer-based detector-free method that reconciles different features with multiple receptive fields adaptively and utilizes parallel networks to realize reliable positional encoding. Specifically, FMRT proposes a dedicated Reconciliatory Transformer (RecFormer) that consists of a Global Perception Attention Layer (GPAL) to extract visual descriptors with different receptive fields and integrate global context information under various scales, Perception Weight Layer (PWL) to measure the importance of various receptive fields adaptively, and Local Perception Feed-forward Network (LPFFN) to extract deep aggregated multi-scale local feature representation. Extensive experiments demonstrate that FMRT yields extraordinary performance on multiple benchmarks, including pose estimation, visual localization, homography estimation, and image matching.

Via

Access Paper or Ask Questions

Affine Frequency Division Multiplexing With Index Modulation

Oct 19, 2023

Yiwei Tao, Miaowen Wen, Yao Ge, Jun Li

Figure 1 for Affine Frequency Division Multiplexing With Index Modulation

Figure 2 for Affine Frequency Division Multiplexing With Index Modulation

Figure 3 for Affine Frequency Division Multiplexing With Index Modulation

Figure 4 for Affine Frequency Division Multiplexing With Index Modulation

Abstract:Affine frequency division multiplexing (AFDM) is a new multicarrier technique based on chirp signals tailored for high-mobility communications, which can achieve full diversity. In this paper, we propose an index modulation (IM) scheme based on the framework of AFDM systems, named AFDM-IM. In the proposed AFDM-IM scheme, the information bits are carried by the activation state of the subsymbols in discrete affine Fourier (DAF) domain in addition to the conventional constellation symbols. To efficiently perform IM, we divide the subsymbols in DAF domain into several groups and consider both the localized and distributed strategies. An asymptotically tight upper bound on the average bit error rate (BER) of the maximum-likelihood detection in the existence of channel estimation errors is derived in closed-form. Computer simulations are carried out to evaluate the performance of the proposed AFDM-IM scheme, whose results corroborate its superiority over the benchmark schemes in the linear time-varying channels. We also evaluate the BER performance of the index and modulated bits for the AFDM-IM scheme with and without satisfying the full diversity condition of AFDM. The results show that the index bits have a stronger diversity protection than the modulated bits even when the full diversity condition of AFDM is not satisfied.

Via

Access Paper or Ask Questions

Federated Meta-Learning for Few-Shot Fault Diagnosis with Representation Encoding

Oct 13, 2023

Jixuan Cui, Jun Li, Zhen Mei, Kang Wei, Sha Wei, Ming Ding, Wen Chen, Song Guo

Figure 1 for Federated Meta-Learning for Few-Shot Fault Diagnosis with Representation Encoding

Figure 2 for Federated Meta-Learning for Few-Shot Fault Diagnosis with Representation Encoding

Figure 3 for Federated Meta-Learning for Few-Shot Fault Diagnosis with Representation Encoding

Figure 4 for Federated Meta-Learning for Few-Shot Fault Diagnosis with Representation Encoding

Abstract:Deep learning-based fault diagnosis (FD) approaches require a large amount of training data, which are difficult to obtain since they are located across different entities. Federated learning (FL) enables multiple clients to collaboratively train a shared model with data privacy guaranteed. However, the domain discrepancy and data scarcity problems among clients deteriorate the performance of the global FL model. To tackle these issues, we propose a novel framework called representation encoding-based federated meta-learning (REFML) for few-shot FD. First, a novel training strategy based on representation encoding and meta-learning is developed. It harnesses the inherent heterogeneity among training clients, effectively transforming it into an advantage for out-of-distribution generalization on unseen working conditions or equipment types. Additionally, an adaptive interpolation method that calculates the optimal combination of local and global models as the initialization of local training is proposed. This helps to further utilize local information to mitigate the negative effects of domain discrepancy. As a result, high diagnostic accuracy can be achieved on unseen working conditions or equipment types with limited training data. Compared with the state-of-the-art methods, such as FedProx, the proposed REFML framework achieves an increase in accuracy by 2.17%-6.50% when tested on unseen working conditions of the same equipment type and 13.44%-18.33% when tested on totally unseen equipment types, respectively.

Via

Access Paper or Ask Questions

Revisiting Multi-modal 3D Semantic Segmentation in Real-world Autonomous Driving

Oct 13, 2023

Feng Jiang, Chaoping Tu, Gang Zhang, Jun Li, Hanqing Huang, Junyu Lin, Di Feng, Jian Pu

Abstract:LiDAR and camera are two critical sensors for multi-modal 3D semantic segmentation and are supposed to be fused efficiently and robustly to promise safety in various real-world scenarios. However, existing multi-modal methods face two key challenges: 1) difficulty with efficient deployment and real-time execution; and 2) drastic performance degradation under weak calibration between LiDAR and cameras. To address these challenges, we propose CPGNet-LCF, a new multi-modal fusion framework extending the LiDAR-only CPGNet. CPGNet-LCF solves the first challenge by inheriting the easy deployment and real-time capabilities of CPGNet. For the second challenge, we introduce a novel weak calibration knowledge distillation strategy during training to improve the robustness against the weak calibration. CPGNet-LCF achieves state-of-the-art performance on the nuScenes and SemanticKITTI benchmarks. Remarkably, it can be easily deployed to run in 20ms per frame on a single Tesla V100 GPU using TensorRT TF16 mode. Furthermore, we benchmark performance over four weak calibration levels, demonstrating the robustness of our proposed approach.

* 7 pages, 3 figures

Via

Access Paper or Ask Questions

Dynamic Appearance Particle Neural Radiance Field

Oct 11, 2023

Ancheng Lin, Jun Li

Figure 1 for Dynamic Appearance Particle Neural Radiance Field

Figure 2 for Dynamic Appearance Particle Neural Radiance Field

Figure 3 for Dynamic Appearance Particle Neural Radiance Field

Figure 4 for Dynamic Appearance Particle Neural Radiance Field

Abstract:Neural Radiance Fields (NeRFs) have shown great potential in modelling 3D scenes. Dynamic NeRFs extend this model by capturing time-varying elements, typically using deformation fields. The existing dynamic NeRFs employ a similar Eulerian representation for both light radiance and deformation fields. This leads to a close coupling of appearance and motion and lacks a physical interpretation. In this work, we propose Dynamic Appearance Particle Neural Radiance Field (DAP-NeRF), which introduces particle-based representation to model the motions of visual elements in a dynamic 3D scene. DAP-NeRF consists of superposition of a static field and a dynamic field. The dynamic field is quantised as a collection of {\em appearance particles}, which carries the visual information of a small dynamic element in the scene and is equipped with a motion model. All components, including the static field, the visual features and motion models of the particles, are learned from monocular videos without any prior geometric knowledge of the scene. We develop an efficient computational framework for the particle-based model. We also construct a new dataset to evaluate motion modelling. Experimental results show that DAP-NeRF is an effective technique to capture not only the appearance but also the physically meaningful motions in a 3D dynamic scene.

Via

Access Paper or Ask Questions

Performance Analysis of RIS-Aided Double Spatial Scattering Modulation for mmWave MIMO Systems

Oct 08, 2023

Xusheng Zhu, Wen Chen, Qingqing Wu, Jun Li, Nan Cheng, Fangjiong Chen, Changle Li

Figure 1 for Performance Analysis of RIS-Aided Double Spatial Scattering Modulation for mmWave MIMO Systems

Figure 2 for Performance Analysis of RIS-Aided Double Spatial Scattering Modulation for mmWave MIMO Systems

Figure 3 for Performance Analysis of RIS-Aided Double Spatial Scattering Modulation for mmWave MIMO Systems

Figure 4 for Performance Analysis of RIS-Aided Double Spatial Scattering Modulation for mmWave MIMO Systems

Abstract:In this paper, we investigate a practical structure of reconfigurable intelligent surface (RIS)-based double spatial scattering modulation (DSSM) for millimeter-wave (mmWave) multiple-input multiple-output (MIMO) systems. A suboptimal detector is proposed, in which the beam direction is first demodulated according to the received beam strength, and then the remaining information is demodulated by adopting the maximum likelihood algorithm. Based on the proposed suboptimal detector, we derive the conditional pairwise error probability expression. Further, the exact numerical integral and closed-form expressions of unconditional pairwise error probability (UPEP) are derived via two different approaches. To provide more insights, we derive the upper bound and asymptotic expressions of UPEP. In addition, the diversity gain of the RIS-DSSM scheme was also given. Furthermore, the union upper bound of average bit error probability (ABEP) is obtained by combining the UPEP and the number of error bits. Simulation results are provided to validate the derived upper bound and asymptotic expressions of ABEP. We found an interesting phenomenon that the ABEP performance of the proposed system-based phase shift keying is better than that of the quadrature amplitude modulation. Additionally, the performance advantage of ABEP is more significant with the increase in the number of RIS elements.

Via

Access Paper or Ask Questions

Amazing Combinatorial Creation: Acceptable Swap-Sampling for Text-to-Image Generation

Oct 03, 2023

Jun Li, Zedong Zhang, Jian Yang

Abstract:Exploring a machine learning system to generate meaningful combinatorial object images from multiple textual descriptions, emulating human creativity, is a significant challenge as humans are able to construct amazing combinatorial objects, but machines strive to emulate data distribution. In this paper, we develop a straightforward yet highly effective technique called acceptable swap-sampling to generate a combinatorial object image that exhibits novelty and surprise, utilizing text concepts of different objects. Initially, we propose a swapping mechanism that constructs a novel embedding by exchanging column vectors of two text embeddings for generating a new combinatorial image through a cutting-edge diffusion model. Furthermore, we design an acceptable region by managing suitable CLIP distances between the new image and the original concept generations, increasing the likelihood of accepting the new image with a high-quality combination. This region allows us to efficiently sample a small subset from a new image pool generated by using randomly exchanging column vectors. Lastly, we employ a segmentation method to compare CLIP distances among the segmented components, ultimately selecting the most promising object image from the sampled subset. Our experiments focus on text pairs of objects from ImageNet, and our results demonstrate that our approach outperforms recent methods such as Stable-Diffusion2, DALLE2, ERNIE-ViLG2 and Bing in generating novel and surprising object images, even when the associated concepts appear to be implausible, such as lionfish-abacus. Furthermore, during the sampling process, our approach without training and human preference is also comparable to PickScore and HPSv2 trained using human preference datasets.

* Project page: \url{https://asst2i.github.io/anon/}

Via

Access Paper or Ask Questions

MonoGAE: Roadside Monocular 3D Object Detection with Ground-Aware Embeddings

Sep 30, 2023

Lei Yang, Jiaxin Yu, Xinyu Zhang, Jun Li, Li Wang, Yi Huang, Chuang Zhang, Hong Wang, Yiming Li

Abstract:Although the majority of recent autonomous driving systems concentrate on developing perception methods based on ego-vehicle sensors, there is an overlooked alternative approach that involves leveraging intelligent roadside cameras to help extend the ego-vehicle perception ability beyond the visual range. We discover that most existing monocular 3D object detectors rely on the ego-vehicle prior assumption that the optical axis of the camera is parallel to the ground. However, the roadside camera is installed on a pole with a pitched angle, which makes the existing methods not optimal for roadside scenes. In this paper, we introduce a novel framework for Roadside Monocular 3D object detection with ground-aware embeddings, named MonoGAE. Specifically, the ground plane is a stable and strong prior knowledge due to the fixed installation of cameras in roadside scenarios. In order to reduce the domain gap between the ground geometry information and high-dimensional image features, we employ a supervised training paradigm with a ground plane to predict high-dimensional ground-aware embeddings. These embeddings are subsequently integrated with image features through cross-attention mechanisms. Furthermore, to improve the detector's robustness to the divergences in cameras' installation poses, we replace the ground plane depth map with a novel pixel-level refined ground plane equation map. Our approach demonstrates a substantial performance advantage over all previous monocular 3D object detectors on widely recognized 3D detection benchmarks for roadside cameras. The code and pre-trained models will be released soon.

* 12 pages, 6 figures

Via

Access Paper or Ask Questions

BEVHeight++: Toward Robust Visual Centric 3D Object Detection

Sep 28, 2023

Lei Yang, Tao Tang, Jun Li, Peng Chen, Kun Yuan, Li Wang, Yi Huang, Xinyu Zhang, Kaicheng Yu

Figure 1 for BEVHeight++: Toward Robust Visual Centric 3D Object Detection

Figure 2 for BEVHeight++: Toward Robust Visual Centric 3D Object Detection

Figure 3 for BEVHeight++: Toward Robust Visual Centric 3D Object Detection

Figure 4 for BEVHeight++: Toward Robust Visual Centric 3D Object Detection

Abstract:While most recent autonomous driving system focuses on developing perception methods on ego-vehicle sensors, people tend to overlook an alternative approach to leverage intelligent roadside cameras to extend the perception ability beyond the visual range. We discover that the state-of-the-art vision-centric bird's eye view detection methods have inferior performances on roadside cameras. This is because these methods mainly focus on recovering the depth regarding the camera center, where the depth difference between the car and the ground quickly shrinks while the distance increases. In this paper, we propose a simple yet effective approach, dubbed BEVHeight++, to address this issue. In essence, we regress the height to the ground to achieve a distance-agnostic formulation to ease the optimization process of camera-only perception methods. By incorporating both height and depth encoding techniques, we achieve a more accurate and robust projection from 2D to BEV spaces. On popular 3D detection benchmarks of roadside cameras, our method surpasses all previous vision-centric methods by a significant margin. In terms of the ego-vehicle scenario, our BEVHeight++ possesses superior over depth-only methods. Specifically, it yields a notable improvement of +1.9% NDS and +1.1% mAP over BEVDepth when evaluated on the nuScenes validation set. Moreover, on the nuScenes test set, our method achieves substantial advancements, with an increase of +2.8% NDS and +1.7% mAP, respectively.

* arXiv admin note: substantial text overlap with arXiv:2303.08498

Via

Access Paper or Ask Questions

Federated Learning in Intelligent Transportation Systems: Recent Applications and Open Problems

Sep 20, 2023

Shiying Zhang, Jun Li, Long Shi, Ming Ding, Dinh C. Nguyen, Wuzheng Tan, Jian Weng, Zhu Han

Abstract:Intelligent transportation systems (ITSs) have been fueled by the rapid development of communication technologies, sensor technologies, and the Internet of Things (IoT). Nonetheless, due to the dynamic characteristics of the vehicle networks, it is rather challenging to make timely and accurate decisions of vehicle behaviors. Moreover, in the presence of mobile wireless communications, the privacy and security of vehicle information are at constant risk. In this context, a new paradigm is urgently needed for various applications in dynamic vehicle environments. As a distributed machine learning technology, federated learning (FL) has received extensive attention due to its outstanding privacy protection properties and easy scalability. We conduct a comprehensive survey of the latest developments in FL for ITS. Specifically, we initially research the prevalent challenges in ITS and elucidate the motivations for applying FL from various perspectives. Subsequently, we review existing deployments of FL in ITS across various scenarios, and discuss specific potential issues in object recognition, traffic management, and service providing scenarios. Furthermore, we conduct a further analysis of the new challenges introduced by FL deployment and the inherent limitations that FL alone cannot fully address, including uneven data distribution, limited storage and computing power, and potential privacy and security concerns. We then examine the existing collaborative technologies that can help mitigate these challenges. Lastly, we discuss the open challenges that remain to be addressed in applying FL in ITS and propose several future research directions.

Via

Access Paper or Ask Questions