Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Xinke Li

Ignoring Directionality Leads to Compromised Graph Neural Network Explanations

Jun 05, 2025

Changsheng Sun, Xinke Li, Jin Song Dong

Abstract:Graph Neural Networks (GNNs) are increasingly used in critical domains, where reliable explanations are vital for supporting human decision-making. However, the common practice of graph symmetrization discards directional information, leading to significant information loss and misleading explanations. Our analysis demonstrates how this practice compromises explanation fidelity. Through theoretical and empirical studies, we show that preserving directional semantics significantly improves explanation quality, ensuring more faithful insights for human decision-makers. These findings highlight the need for direction-aware GNN explainability in security-critical applications.

* 2025 IEEE Security and Privacy (Workshops)

Via

Access Paper or Ask Questions

Rethinking Gradient-based Adversarial Attacks on Point Cloud Classification

May 28, 2025

Jun Chen, Xinke Li, Mingyue Xu, Tianrui Li, Chongshou Li

Abstract:Gradient-based adversarial attacks have become a dominant approach for evaluating the robustness of point cloud classification models. However, existing methods often rely on uniform update rules that fail to consider the heterogeneous nature of point clouds, resulting in excessive and perceptible perturbations. In this paper, we rethink the design of gradient-based attacks by analyzing the limitations of conventional gradient update mechanisms and propose two new strategies to improve both attack effectiveness and imperceptibility. First, we introduce WAAttack, a novel framework that incorporates weighted gradients and an adaptive step-size strategy to account for the non-uniform contribution of points during optimization. This approach enables more targeted and subtle perturbations by dynamically adjusting updates according to the local structure and sensitivity of each point. Second, we propose SubAttack, a complementary strategy that decomposes the point cloud into subsets and focuses perturbation efforts on structurally critical regions. Together, these methods represent a principled rethinking of gradient-based adversarial attacks for 3D point cloud classification. Extensive experiments demonstrate that our approach outperforms state-of-the-art baselines in generating highly imperceptible adversarial examples. Code will be released upon paper acceptance.

Via

Access Paper or Ask Questions

Causal Self-supervised Pretrained Frontend with Predictive Code for Speech Separation

Apr 03, 2025

Wupeng Wang, Zexu Pan, Xinke Li, Shuai Wang, Haizhou Li

Figure 1 for Causal Self-supervised Pretrained Frontend with Predictive Code for Speech Separation

Figure 2 for Causal Self-supervised Pretrained Frontend with Predictive Code for Speech Separation

Figure 3 for Causal Self-supervised Pretrained Frontend with Predictive Code for Speech Separation

Figure 4 for Causal Self-supervised Pretrained Frontend with Predictive Code for Speech Separation

Abstract:Speech separation (SS) seeks to disentangle a multi-talker speech mixture into single-talker speech streams. Although SS can be generally achieved using offline methods, such a processing paradigm is not suitable for real-time streaming applications. Causal separation models, which rely only on past and present information, offer a promising solution for real-time streaming. However, these models typically suffer from notable performance degradation due to the absence of future context. In this paper, we introduce a novel frontend that is designed to mitigate the mismatch between training and run-time inference by implicitly incorporating future information into causal models through predictive patterns. The pretrained frontend employs a transformer decoder network with a causal convolutional encoder as the backbone and is pretrained in a self-supervised manner with two innovative pretext tasks: autoregressive hybrid prediction and contextual knowledge distillation. These tasks enable the model to capture predictive patterns directly from mixtures in a self-supervised manner. The pretrained frontend subsequently serves as a feature extractor to generate high-quality predictive patterns. Comprehensive evaluations on synthetic and real-world datasets validated the effectiveness of the proposed pretrained frontend.

* arXiv admin note: text overlap with arXiv:2411.03085

Via

Access Paper or Ask Questions

Controllable 3D Outdoor Scene Generation via Scene Graphs

Mar 10, 2025

Yuheng Liu, Xinke Li, Yuning Zhang, Lu Qi, Xin Li, Wenping Wang, Chongshou Li, Xueting Li, Ming-Hsuan Yang

Abstract:Three-dimensional scene generation is crucial in computer vision, with applications spanning autonomous driving, gaming and the metaverse. Current methods either lack user control or rely on imprecise, non-intuitive conditions. In this work, we propose a method that uses, scene graphs, an accessible, user friendly control format to generate outdoor 3D scenes. We develop an interactive system that transforms a sparse scene graph into a dense BEV (Bird's Eye View) Embedding Map, which guides a conditional diffusion model to generate 3D scenes that match the scene graph description. During inference, users can easily create or modify scene graphs to generate large-scale outdoor scenes. We create a large-scale dataset with paired scene graphs and 3D semantic scenes to train the BEV embedding and diffusion models. Experimental results show that our approach consistently produces high-quality 3D urban scenes closely aligned with the input scene graphs. To the best of our knowledge, this is the first approach to generate 3D outdoor scenes conditioned on scene graphs.

* Project Page: https://yuheng.ink/project-page/control-3d-scene/

Via

Access Paper or Ask Questions

Speech Separation with Pretrained Frontend to Minimize Domain Mismatch

Nov 05, 2024

Wupeng Wang, Zexu Pan, Xinke Li, Shuai Wang, Haizhou Li

Figure 1 for Speech Separation with Pretrained Frontend to Minimize Domain Mismatch

Figure 2 for Speech Separation with Pretrained Frontend to Minimize Domain Mismatch

Figure 3 for Speech Separation with Pretrained Frontend to Minimize Domain Mismatch

Figure 4 for Speech Separation with Pretrained Frontend to Minimize Domain Mismatch

Abstract:Speech separation seeks to separate individual speech signals from a speech mixture. Typically, most separation models are trained on synthetic data due to the unavailability of target reference in real-world cocktail party scenarios. As a result, there exists a domain gap between real and synthetic data when deploying speech separation models in real-world applications. In this paper, we propose a self-supervised domain-invariant pretrained (DIP) frontend that is exposed to mixture data without the need for target reference speech. The DIP frontend utilizes a Siamese network with two innovative pretext tasks, mixture predictive coding (MPC) and mixture invariant coding (MIC), to capture shared contextual cues between real and synthetic unlabeled mixtures. Subsequently, we freeze the DIP frontend as a feature extractor when training the downstream speech separation models on synthetic data. By pretraining the DIP frontend with the contextual cues, we expect that the speech separation skills learned from synthetic data can be effectively transferred to real data. To benefit from the DIP frontend, we introduce a novel separation pipeline to align the feature resolution of the separation models. We evaluate the speech separation quality on standard benchmarks and real-world datasets. The results confirm the superiority of our DIP frontend over existing speech separation models. This study underscores the potential of large-scale pretraining to enhance the quality and intelligibility of speech separation in real-world applications.

* IEEE/ACM Transactions on Audio, Speech, and Language Processing.32(2024)4184-4198
* IEEE/ACM Transactions on Audio, Speech, and Language Processing

Via

Access Paper or Ask Questions

Enhancing Sampling Protocol for Robust Point Cloud Classification

Aug 22, 2024

Chongshou Li, Pin Tang, Xinke Li, Tianrui Li

Abstract:Established sampling protocols for 3D point cloud learning, such as Farthest Point Sampling (FPS) and Fixed Sample Size (FSS), have long been recognized and utilized. However, real-world data often suffer from corrputions such as sensor noise, which violates the benignness assumption of point cloud in current protocols. Consequently, they are notably vulnerable to noise, posing significant safety risks in critical applications like autonomous driving. To address these issues, we propose an enhanced point cloud sampling protocol, PointDR, which comprises two components: 1) Downsampling for key point identification and 2) Resampling for flexible sample size. Furthermore, differentiated strategies are implemented for training and inference processes. Particularly, an isolation-rated weight considering local density is designed for the downsampling method, assisting it in performing random key points selection in the training phase and bypassing noise in the inference phase. A local-geometry-preserved upsampling is incorporated into resampling, facilitating it to maintain a stochastic sample size in the training stage and complete insufficient data in the inference. It is crucial to note that the proposed protocol is free of model architecture altering and extra learning, thus minimal efforts are demanded for its replacement of the existing one. Despite the simplicity, it substantially improves the robustness of point cloud learning, showcased by outperforming the state-of-the-art methods on multiple benchmarks of corrupted point cloud classification. The code will be available upon the paper's acceptance.

Via

Access Paper or Ask Questions

Pyramid Diffusion for Fine 3D Large Scene Generation

Nov 20, 2023

Yuheng Liu, Xinke Li, Xueting Li, Lu Qi, Chongshou Li, Ming-Hsuan Yang

Abstract:Directly transferring the 2D techniques to 3D scene generation is challenging due to significant resolution reduction and the scarcity of comprehensive real-world 3D scene datasets. To address these issues, our work introduces the Pyramid Discrete Diffusion model (PDD) for 3D scene generation. This novel approach employs a multi-scale model capable of progressively generating high-quality 3D scenes from coarse to fine. In this way, the PDD can generate high-quality scenes within limited resource constraints and does not require additional data sources. To the best of our knowledge, we are the first to adopt the simple but effective coarse-to-fine strategy for 3D large scene generation. Our experiments, covering both unconditional and conditional generation, have yielded impressive results, showcasing the model's effectiveness and robustness in generating realistic and detailed 3D scenes. Our code will be available to the public.

* Project page: https://yuheng.ink/project-page/pyramid-discrete-diffusion/

Via

Access Paper or Ask Questions

Gradient-based adaptive wavelet de-noising method for photoacoustic imaging in vivo

Jul 25, 2023

Xinke Li, Peng Ge, Yuting Shen, Feng Gao, Fei Gao

Abstract:Photoacoustic imaging (PAI) has been applied to many biomedical applications over the past decades. However, the received PA signal usually suffers from poor signal-to-noise ratio (SNR). Conventional solution of employing higher-power laser, or doing long-time signal averaging, may raise the system cost, time consumption, and tissue damage. Another strategy is de-noising algorithm design. In this paper, we propose a new de-noising method, termed gradient-based adaptive wavelet de-noising, which sets the energy gradient mutation point of low-frequency wavelet components as the threshold. We conducted simulation, ex vivo and in vivo experiments to validate the performance of the algorithm. The quality of de-noised PA image/signal by our proposed algorithm has improved by 20%-40%, in comparison to the traditional signal denoising algorithms, which produces better contrast and clearer details. The proposed de-noising method provides potential to improve the SNR of PA signal under single-shot low-power laser illumination for biomedical applications in vivo.

Via

Access Paper or Ask Questions

Risk-optimized Outlier Removal for Robust Point Cloud Classification

Jul 20, 2023

Xinke Li, Junchi Lu

Figure 1 for Risk-optimized Outlier Removal for Robust Point Cloud Classification

Figure 2 for Risk-optimized Outlier Removal for Robust Point Cloud Classification

Figure 3 for Risk-optimized Outlier Removal for Robust Point Cloud Classification

Figure 4 for Risk-optimized Outlier Removal for Robust Point Cloud Classification

Abstract:The popularity of point cloud deep models for safety-critical purposes has increased, but the reliability and security of these models can be compromised by intentional or naturally occurring point cloud noise. To combat this issue, we present a novel point cloud outlier removal method called PointCVaR, which empowers standard-trained models to eliminate additional outliers and restore the data. Our approach begins by conducting attribution analysis to determine the influence of each point on the model output, which we refer to as point risk. We then optimize the process of filtering high-risk points using Conditional Value at Risk (CVaR) as the objective. The rationale for this approach is based on the observation that noise points in point clouds tend to cluster in the tail of the risk distribution, with a low frequency but a high level of risk, resulting in significant interference with classification results. Despite requiring no additional training effort, our method produces exceptional results in various removal-and-classification experiments for noisy point clouds, which are corrupted by random noise, adversarial noise, and backdoor trigger noise. Impressively, it achieves 87% accuracy in defense against the backdoor attack by removing triggers. Overall, the proposed PointCVaR effectively eliminates noise points and enhances point cloud classification, making it a promising plug-in module for various models in different scenarios.

Via

Access Paper or Ask Questions

Primitive3D: 3D Object Dataset Synthesis from Randomly Assembled Primitives

May 25, 2022

Xinke Li, Henghui Ding, Zekun Tong, Yuwei Wu, Yeow Meng Chee

Figure 1 for Primitive3D: 3D Object Dataset Synthesis from Randomly Assembled Primitives

Figure 2 for Primitive3D: 3D Object Dataset Synthesis from Randomly Assembled Primitives

Figure 3 for Primitive3D: 3D Object Dataset Synthesis from Randomly Assembled Primitives

Figure 4 for Primitive3D: 3D Object Dataset Synthesis from Randomly Assembled Primitives

Abstract:Numerous advancements in deep learning can be attributed to the access to large-scale and well-annotated datasets. However, such a dataset is prohibitively expensive in 3D computer vision due to the substantial collection cost. To alleviate this issue, we propose a cost-effective method for automatically generating a large amount of 3D objects with annotations. In particular, we synthesize objects simply by assembling multiple random primitives. These objects are thus auto-annotated with part labels originating from primitives. This allows us to perform multi-task learning by combining the supervised segmentation with unsupervised reconstruction. Considering the large overhead of learning on the generated dataset, we further propose a dataset distillation strategy to remove redundant samples regarding a target dataset. We conduct extensive experiments for the downstream tasks of 3D object classification. The results indicate that our dataset, together with multi-task pretraining on its annotations, achieves the best performance compared to other commonly used datasets. Further study suggests that our strategy can improve the model performance by pretraining and fine-tuning scheme, especially for the dataset with a small scale. In addition, pretraining with the proposed dataset distillation method can save 86\% of the pretraining time with negligible performance degradation. We expect that our attempt provides a new data-centric perspective for training 3D deep models.

* CVPR 2022

Via

Access Paper or Ask Questions