Alert button
Picture for Yuan Xie

Yuan Xie

Alert button

LiDAR-Camera Panoptic Segmentation via Geometry-Consistent and Semantic-Aware Alignment

Aug 11, 2023
Zhiwei Zhang, Zhizhong Zhang, Qian Yu, Ran Yi, Yuan Xie, Lizhuang Ma

Figure 1 for LiDAR-Camera Panoptic Segmentation via Geometry-Consistent and Semantic-Aware Alignment
Figure 2 for LiDAR-Camera Panoptic Segmentation via Geometry-Consistent and Semantic-Aware Alignment
Figure 3 for LiDAR-Camera Panoptic Segmentation via Geometry-Consistent and Semantic-Aware Alignment
Figure 4 for LiDAR-Camera Panoptic Segmentation via Geometry-Consistent and Semantic-Aware Alignment

3D panoptic segmentation is a challenging perception task that requires both semantic segmentation and instance segmentation. In this task, we notice that images could provide rich texture, color, and discriminative information, which can complement LiDAR data for evident performance improvement, but their fusion remains a challenging problem. To this end, we propose LCPS, the first LiDAR-Camera Panoptic Segmentation network. In our approach, we conduct LiDAR-Camera fusion in three stages: 1) an Asynchronous Compensation Pixel Alignment (ACPA) module that calibrates the coordinate misalignment caused by asynchronous problems between sensors; 2) a Semantic-Aware Region Alignment (SARA) module that extends the one-to-one point-pixel mapping to one-to-many semantic relations; 3) a Point-to-Voxel feature Propagation (PVP) module that integrates both geometric and semantic fusion information for the entire point cloud. Our fusion strategy improves about 6.9% PQ performance over the LiDAR-only baseline on NuScenes dataset. Extensive quantitative and qualitative experiments further demonstrate the effectiveness of our novel framework. The code will be released at https://github.com/zhangzw12319/lcps.git.

* Accepted as ICCV 2023 paper 
Viaarxiv icon

Underwater Acoustic Target Recognition based on Smoothness-inducing Regularization and Spectrogram-based Data Augmentation

Jun 12, 2023
Ji Xu, Yuan Xie, Wenchao Wang

Figure 1 for Underwater Acoustic Target Recognition based on Smoothness-inducing Regularization and Spectrogram-based Data Augmentation
Figure 2 for Underwater Acoustic Target Recognition based on Smoothness-inducing Regularization and Spectrogram-based Data Augmentation
Figure 3 for Underwater Acoustic Target Recognition based on Smoothness-inducing Regularization and Spectrogram-based Data Augmentation
Figure 4 for Underwater Acoustic Target Recognition based on Smoothness-inducing Regularization and Spectrogram-based Data Augmentation

Underwater acoustic target recognition is a challenging task owing to the intricate underwater environments and limited data availability. Insufficient data can hinder the ability of recognition systems to support complex modeling, thus impeding their advancement. To improve the generalization capacity of recognition models, techniques such as data augmentation have been employed to simulate underwater signals and diversify data distribution. However, the complexity of underwater environments can cause the simulated signals to deviate from real scenarios, resulting in biased models that are misguided by non-true data. In this study, we propose two strategies to enhance the generalization ability of models in the case of limited data while avoiding the risk of performance degradation. First, as an alternative to traditional data augmentation, we utilize smoothness-inducing regularization, which only incorporates simulated signals in the regularization term. Additionally, we propose a specialized spectrogram-based data augmentation strategy, namely local masking and replicating (LMR), to capture inter-class relationships. Our experiments and visualization analysis demonstrate the superiority of our proposed strategies.

* Ocean Engineering, 2023, 281: 114926  
Viaarxiv icon

Underwater-Art: Expanding Information Perspectives With Text Templates For Underwater Acoustic Target Recognition

May 31, 2023
Yuan Xie, Jiawei Ren, Ji Xu

Figure 1 for Underwater-Art: Expanding Information Perspectives With Text Templates For Underwater Acoustic Target Recognition
Figure 2 for Underwater-Art: Expanding Information Perspectives With Text Templates For Underwater Acoustic Target Recognition
Figure 3 for Underwater-Art: Expanding Information Perspectives With Text Templates For Underwater Acoustic Target Recognition
Figure 4 for Underwater-Art: Expanding Information Perspectives With Text Templates For Underwater Acoustic Target Recognition

Underwater acoustic target recognition is an intractable task due to the complex acoustic source characteristics and sound propagation patterns. Limited by insufficient data and narrow information perspective, recognition models based on deep learning seem far from satisfactory in practical underwater scenarios. Although underwater acoustic signals are severely influenced by distance, channel depth, or other factors, annotations of relevant information are often non-uniform, incomplete, and hard to use. In our work, we propose to implement Underwater Acoustic Recognition based on Templates made up of rich relevant information (hereinafter called "UART"). We design templates to integrate relevant information from different perspectives into descriptive natural language. UART adopts an audio-spectrogram-text tri-modal contrastive learning framework, which endows UART with the ability to guide the learning of acoustic representations by descriptive natural language. Our experiments reveal that UART has better recognition capability and generalization performance than traditional paradigms. Furthermore, the pre-trained UART model could provide superior prior knowledge for the recognition model in the scenario without any auxiliary annotation.

* The Journal of the Acoustical Society of America, 2022, 152(5): 2641-2651  
Viaarxiv icon

Learning Music Sequence Representation from Text Supervision

May 31, 2023
Tianyu Chen, Yuan Xie, Shuai Zhang, Shaohan Huang, Haoyi Zhou, Jianxin Li

Music representation learning is notoriously difficult for its complex human-related concepts contained in the sequence of numerical signals. To excavate better MUsic SEquence Representation from labeled audio, we propose a novel text-supervision pre-training method, namely MUSER. MUSER adopts an audio-spectrum-text tri-modal contrastive learning framework, where the text input could be any form of meta-data with the help of text templates while the spectrum is derived from an audio sequence. Our experiments reveal that MUSER could be more flexibly adapted to downstream tasks compared with the current data-hungry pre-training method, and it only requires 0.056% of pre-training data to achieve the state-of-the-art performance.

* IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2022: 4583-4587  
Viaarxiv icon

Adaptive ship-radiated noise recognition with learnable fine-grained wavelet transform

May 31, 2023
Yuan Xie, Jiawei Ren, Ji Xu

Figure 1 for Adaptive ship-radiated noise recognition with learnable fine-grained wavelet transform
Figure 2 for Adaptive ship-radiated noise recognition with learnable fine-grained wavelet transform
Figure 3 for Adaptive ship-radiated noise recognition with learnable fine-grained wavelet transform
Figure 4 for Adaptive ship-radiated noise recognition with learnable fine-grained wavelet transform

Analyzing the ocean acoustic environment is a tricky task. Background noise and variable channel transmission environment make it complicated to implement accurate ship-radiated noise recognition. Existing recognition systems are weak in addressing the variable underwater environment, thus leading to disappointing performance in practical application. In order to keep the recognition system robust in various underwater environments, this work proposes an adaptive generalized recognition system - AGNet (Adaptive Generalized Network). By converting fixed wavelet parameters into fine-grained learnable parameters, AGNet learns the characteristics of underwater sound at different frequencies. Its flexible and fine-grained design is conducive to capturing more background acoustic information (e.g., background noise, underwater transmission channel). To utilize the implicit information in wavelet spectrograms, AGNet adopts the convolutional neural network with parallel convolution attention modules as the classifier. Experiments reveal that our AGNet outperforms all baseline methods on several underwater acoustic datasets, and AGNet could benefit more from transfer learning. Moreover, AGNet shows robust performance against various interference factors.

* Ocean Engineering 265 (2022): 112626  
Viaarxiv icon

Advancing underwater acoustic target recognition via adaptive data pruning and smoothness-inducing regularization

Apr 24, 2023
Yuan Xie, Tianyu Chen, Ji Xu

Figure 1 for Advancing underwater acoustic target recognition via adaptive data pruning and smoothness-inducing regularization
Figure 2 for Advancing underwater acoustic target recognition via adaptive data pruning and smoothness-inducing regularization
Figure 3 for Advancing underwater acoustic target recognition via adaptive data pruning and smoothness-inducing regularization
Figure 4 for Advancing underwater acoustic target recognition via adaptive data pruning and smoothness-inducing regularization

Underwater acoustic recognition for ship-radiated signals has high practical application value due to the ability to recognize non-line-of-sight targets. However, due to the difficulty of data acquisition, the collected signals are scarce in quantity and mainly composed of mechanical periodic noise. According to the experiments, we observe that the repeatability of periodic signals leads to a double-descent phenomenon, which indicates a significant local bias toward repeated samples. To address this issue, we propose a strategy based on cross-entropy to prune excessively similar segments in training data. Furthermore, to compensate for the reduction of training data, we generate noisy samples and apply smoothness-inducing regularization based on KL divergence to mitigate overfitting. Experiments show that our proposed data pruning and regularization strategy can bring stable benefits and our framework significantly outperforms the state-of-the-art in low-resource scenarios.

Viaarxiv icon

NPS: A Framework for Accurate Program Sampling Using Graph Neural Network

Apr 18, 2023
Yuanwei Fang, Zihao Liu, Yanheng Lu, Jiawei Liu, Jiajie Li, Yi Jin, Jian Chen, Yenkuang Chen, Hongzhong Zheng, Yuan Xie

Figure 1 for NPS: A Framework for Accurate Program Sampling Using Graph Neural Network
Figure 2 for NPS: A Framework for Accurate Program Sampling Using Graph Neural Network
Figure 3 for NPS: A Framework for Accurate Program Sampling Using Graph Neural Network
Figure 4 for NPS: A Framework for Accurate Program Sampling Using Graph Neural Network

With the end of Moore's Law, there is a growing demand for rapid architectural innovations in modern processors, such as RISC-V custom extensions, to continue performance scaling. Program sampling is a crucial step in microprocessor design, as it selects representative simulation points for workload simulation. While SimPoint has been the de-facto approach for decades, its limited expressiveness with Basic Block Vector (BBV) requires time-consuming human tuning, often taking months, which impedes fast innovation and agile hardware development. This paper introduces Neural Program Sampling (NPS), a novel framework that learns execution embeddings using dynamic snapshots of a Graph Neural Network. NPS deploys AssemblyNet for embedding generation, leveraging an application's code structures and runtime states. AssemblyNet serves as NPS's graph model and neural architecture, capturing a program's behavior in aspects such as data computation, code path, and data flow. AssemblyNet is trained with a data prefetch task that predicts consecutive memory addresses. In the experiments, NPS outperforms SimPoint by up to 63%, reducing the average error by 38%. Additionally, NPS demonstrates strong robustness with increased accuracy, reducing the expensive accuracy tuning overhead. Furthermore, NPS shows higher accuracy and generality than the state-of-the-art GNN approach in code behavior learning, enabling the generation of high-quality execution embeddings.

Viaarxiv icon

SpatialFormer: Semantic and Target Aware Attentions for Few-Shot Learning

Mar 15, 2023
Jinxiang Lai, Siqian Yang, Wenlong Wu, Tao Wu, Guannan Jiang, Xi Wang, Jun Liu, Bin-Bin Gao, Wei Zhang, Yuan Xie, Chengjie Wang

Figure 1 for SpatialFormer: Semantic and Target Aware Attentions for Few-Shot Learning
Figure 2 for SpatialFormer: Semantic and Target Aware Attentions for Few-Shot Learning
Figure 3 for SpatialFormer: Semantic and Target Aware Attentions for Few-Shot Learning
Figure 4 for SpatialFormer: Semantic and Target Aware Attentions for Few-Shot Learning

Recent Few-Shot Learning (FSL) methods put emphasis on generating a discriminative embedding features to precisely measure the similarity between support and query sets. Current CNN-based cross-attention approaches generate discriminative representations via enhancing the mutually semantic similar regions of support and query pairs. However, it suffers from two problems: CNN structure produces inaccurate attention map based on local features, and mutually similar backgrounds cause distraction. To alleviate these problems, we design a novel SpatialFormer structure to generate more accurate attention regions based on global features. Different from the traditional Transformer modeling intrinsic instance-level similarity which causes accuracy degradation in FSL, our SpatialFormer explores the semantic-level similarity between pair inputs to boost the performance. Then we derive two specific attention modules, named SpatialFormer Semantic Attention (SFSA) and SpatialFormer Target Attention (SFTA), to enhance the target object regions while reduce the background distraction. Particularly, SFSA highlights the regions with same semantic information between pair features, and SFTA finds potential foreground object regions of novel feature that are similar to base categories. Extensive experiments show that our methods are effective and achieve new state-of-the-art results on few-shot classification benchmarks.

* AAAI 2023  
Viaarxiv icon

High-Resolution GAN Inversion for Degraded Images in Large Diverse Datasets

Feb 07, 2023
Yanbo Wang, Chuming Lin, Donghao Luo, Ying Tai, Zhizhong Zhang, Yuan Xie

Figure 1 for High-Resolution GAN Inversion for Degraded Images in Large Diverse Datasets
Figure 2 for High-Resolution GAN Inversion for Degraded Images in Large Diverse Datasets
Figure 3 for High-Resolution GAN Inversion for Degraded Images in Large Diverse Datasets
Figure 4 for High-Resolution GAN Inversion for Degraded Images in Large Diverse Datasets

The last decades are marked by massive and diverse image data, which shows increasingly high resolution and quality. However, some images we obtained may be corrupted, affecting the perception and the application of downstream tasks. A generic method for generating a high-quality image from the degraded one is in demand. In this paper, we present a novel GAN inversion framework that utilizes the powerful generative ability of StyleGAN-XL for this problem. To ease the inversion challenge with StyleGAN-XL, Clustering \& Regularize Inversion (CRI) is proposed. Specifically, the latent space is firstly divided into finer-grained sub-spaces by clustering. Instead of initializing the inversion with the average latent vector, we approximate a centroid latent vector from the clusters, which generates an image close to the input image. Then, an offset with a regularization term is introduced to keep the inverted latent vector within a certain range. We validate our CRI scheme on multiple restoration tasks (i.e., inpainting, colorization, and super-resolution) of complex natural images, and show preferable quantitative and qualitative results. We further demonstrate our technique is robust in terms of data and different GAN models. To our best knowledge, we are the first to adopt StyleGAN-XL for generating high-quality natural images from diverse degraded inputs. Code is available at https://github.com/Booooooooooo/CRI.

* Accepted by AAAI2023 
Viaarxiv icon