Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yichu Xu

UniPCB: A Generation-Assisted Detection Framework for PCB Defect Inspection

May 06, 2026

Huan Zhang, Lianghong Tan, Yichu Xu, Jiangzhong Cao, Huanqi Wu, Linwei Zhu, Xu Zhang

Abstract:Printed Circuit Board (PCB) defect inspection faces two compounding challenges: scarce and imbalanced defect samples that limit model training, and insufficient feature representation under complex circuit backgrounds. Existing generation methods rely on single-modality conditions with coarse structural control, while detection methods improve architectures without addressing the data bottleneck. To resolve both challenges jointly, we propose a generation-assisted PCB defect inspection framework that integrates controlled defect synthesis with task-specific defect detection. On the generation side, a Multi-modal Condition Generator extracts complementary edge, depth, and text conditions in parallel. A ScaleEncoder then embeds these conditions into the diffusion U-Net at four resolutions, and a Condition Modulation applies FiLM-style spatially-adaptive modulation at each scale, enabling structurally aligned and defect-aware sample synthesis. On the detection side, an Inverted Residual Shift Attention couples self-attention with shift-wise convolution to jointly capture global context and local texture, and a Cross-level Complementary Fusion Block generates pixel-level gates for selective cross-level feature fusion. The synthesized samples directly enrich the detection training set, so that improvements in generation compound with improvements in detection. Extensive experiments on DsPCBSD+ demonstrate that UniPCB achieves mAP@0.5 of 98.0% and mAP@0.5:0.95 of 61.8% on defect detection, surpassing all compared methods, while the generation branch attains an FID of 129.61 and SSIM of 0.619, outperforming existing conditional generation approaches.

Via

Access Paper or Ask Questions

MambaMoE: Mixture-of-Spectral-Spatial-Experts State Space Model for Hyperspectral Image Classification

Apr 29, 2025

Yichu Xu, Di Wang, Hongzan Jiao, Lefei Zhang, Liangpei Zhang

Abstract:The Mamba model has recently demonstrated strong potential in hyperspectral image (HSI) classification, owing to its ability to perform context modeling with linear computational complexity. However, existing Mamba-based methods usually neglect the spectral and spatial directional characteristics related to heterogeneous objects in hyperspectral scenes, leading to limited classification performance. To address these issues, we propose MambaMoE, a novel spectral-spatial mixture-of-experts framework, representing the first MoE-based approach in the HSI classification community. Specifically, we design a Mixture of Mamba Expert Block (MoMEB) that leverages sparse expert activation to enable adaptive spectral-spatial modeling. Furthermore, we introduce an uncertainty-guided corrective learning (UGCL) strategy to encourage the model's attention toward complex regions prone to prediction ambiguity. Extensive experiments on multiple public HSI benchmarks demonstrate that MambaMoE achieves state-of-the-art performance in both accuracy and efficiency compared to existing advanced approaches, especially for Mamba-based methods. Code will be released.

Via

Access Paper or Ask Questions

HSANET: A Hybrid Self-Cross Attention Network For Remote Sensing Change Detection

Apr 21, 2025

Chengxi Han, Xiaoyu Su, Zhiqiang Wei, Meiqi Hu, Yichu Xu

Figure 1 for HSANET: A Hybrid Self-Cross Attention Network For Remote Sensing Change Detection

Figure 2 for HSANET: A Hybrid Self-Cross Attention Network For Remote Sensing Change Detection

Figure 3 for HSANET: A Hybrid Self-Cross Attention Network For Remote Sensing Change Detection

Abstract:The remote sensing image change detection task is an essential method for large-scale monitoring. We propose HSANet, a network that uses hierarchical convolution to extract multi-scale features. It incorporates hybrid self-attention and cross-attention mechanisms to learn and fuse global and cross-scale information. This enables HSANet to capture global context at different scales and integrate cross-scale features, refining edge details and improving detection performance. We will also open-source our model code: https://github.com/ChengxiHAN/HSANet.

Via

Access Paper or Ask Questions

FedDyMem: Efficient Federated Learning with Dynamic Memory and Memory-Reduce for Unsupervised Image Anomaly Detection

Feb 28, 2025

Silin Chen, Kangjian Di, Yichu Xu, Han-Jia Ye, Wenhan Luo, Ningmu Zou

Figure 1 for FedDyMem: Efficient Federated Learning with Dynamic Memory and Memory-Reduce for Unsupervised Image Anomaly Detection

Figure 2 for FedDyMem: Efficient Federated Learning with Dynamic Memory and Memory-Reduce for Unsupervised Image Anomaly Detection

Figure 3 for FedDyMem: Efficient Federated Learning with Dynamic Memory and Memory-Reduce for Unsupervised Image Anomaly Detection

Figure 4 for FedDyMem: Efficient Federated Learning with Dynamic Memory and Memory-Reduce for Unsupervised Image Anomaly Detection

Abstract:Unsupervised image anomaly detection (UAD) has become a critical process in industrial and medical applications, but it faces growing challenges due to increasing concerns over data privacy. The limited class diversity inherent to one-class classification tasks, combined with distribution biases caused by variations in products across and within clients, poses significant challenges for preserving data privacy with federated UAD. Thus, this article proposes an efficient federated learning method with dynamic memory and memory-reduce for unsupervised image anomaly detection, called FedDyMem. Considering all client data belongs to a single class (i.e., normal sample) in UAD and the distribution of intra-class features demonstrates significant skewness, FedDyMem facilitates knowledge sharing between the client and server through the client's dynamic memory bank instead of model parameters. In the local clients, a memory generator and a metric loss are employed to improve the consistency of the feature distribution for normal samples, leveraging the local model to update the memory bank dynamically. For efficient communication, a memory-reduce method based on weighted averages is proposed to significantly decrease the scale of memory banks. On the server, global memory is constructed and distributed to individual clients through k-means aggregation. Experiments conducted on six industrial and medical datasets, comprising a mixture of six products or health screening types derived from eleven public datasets, demonstrate the effectiveness of FedDyMem.

Via

Access Paper or Ask Questions

Selective Transformer for Hyperspectral Image Classification

Oct 07, 2024

Yichu Xu, Di Wang, Lefei Zhang, Liangpei Zhang

Figure 1 for Selective Transformer for Hyperspectral Image Classification

Figure 2 for Selective Transformer for Hyperspectral Image Classification

Figure 3 for Selective Transformer for Hyperspectral Image Classification

Figure 4 for Selective Transformer for Hyperspectral Image Classification

Abstract:Transformer has achieved satisfactory results in the field of hyperspectral image (HSI) classification. However, existing Transformer models face two key challenges when dealing with HSI scenes characterized by diverse land cover types and rich spectral information: (1) fixed receptive field representation overlooks effective contextual information; (2) redundant self-attention feature representation. To address these limitations, we propose a novel Selective Transformer (SFormer) for HSI classification. The SFormer is designed to dynamically select receptive fields for capturing both spatial and spectral contextual information, while mitigating the impact of redundant data by prioritizing the most relevant features. This enables a highly accurate classification of the land covers of the HSI. Specifically, a Kernel Selective Transformer Block (KSTB) is first utilized to dynamically select an appropriate receptive field range to effectively extract spatial-spectral features. Furthermore, to capture the most crucial tokens, a Token Selective Transformer Block (TSTB) is introduced, which selects the most relevant tokens based on the ranking of attention scores for each query. Extensive experiments on four benchmark HSI datasets demonstrate that the proposed SFormer outperforms the state-of-the-art HSI classification models. The codes will be released.

Via

Access Paper or Ask Questions

Weight Scope Alignment: A Frustratingly Easy Method for Model Merging

Aug 22, 2024

Yichu Xu, Xin-Chun Li, Le Gan, De-Chuan Zhan

Figure 1 for Weight Scope Alignment: A Frustratingly Easy Method for Model Merging

Figure 2 for Weight Scope Alignment: A Frustratingly Easy Method for Model Merging

Figure 3 for Weight Scope Alignment: A Frustratingly Easy Method for Model Merging

Figure 4 for Weight Scope Alignment: A Frustratingly Easy Method for Model Merging

Abstract:Merging models becomes a fundamental procedure in some applications that consider model efficiency and robustness. The training randomness or Non-I.I.D. data poses a huge challenge for averaging-based model fusion. Previous research efforts focus on element-wise regularization or neural permutations to enhance model averaging while overlooking weight scope variations among models, which can significantly affect merging effectiveness. In this paper, we reveal variations in weight scope under different training conditions, shedding light on its influence on model merging. Fortunately, the parameters in each layer basically follow the Gaussian distribution, which inspires a novel and simple regularization approach named Weight Scope Alignment (WSA). It contains two key components: 1) leveraging a target weight scope to guide the model training process for ensuring weight scope matching in the subsequent model merging. 2) fusing the weight scope of two or more models into a unified one for multi-stage model fusion. We extend the WSA regularization to two different scenarios, including Mode Connectivity and Federated Learning. Abundant experimental studies validate the effectiveness of our approach.

Via

Access Paper or Ask Questions

HyperSIGMA: Hyperspectral Intelligence Comprehension Foundation Model

Jun 17, 2024

Di Wang, Meiqi Hu, Yao Jin, Yuchun Miao, Jiaqi Yang, Yichu Xu, Xiaolei Qin, Jiaqi Ma, Lingyu Sun, Chenxing Li(+12 more)

Figure 1 for HyperSIGMA: Hyperspectral Intelligence Comprehension Foundation Model

Figure 2 for HyperSIGMA: Hyperspectral Intelligence Comprehension Foundation Model

Figure 3 for HyperSIGMA: Hyperspectral Intelligence Comprehension Foundation Model

Figure 4 for HyperSIGMA: Hyperspectral Intelligence Comprehension Foundation Model

Abstract:Foundation models (FMs) are revolutionizing the analysis and understanding of remote sensing (RS) scenes, including aerial RGB, multispectral, and SAR images. However, hyperspectral images (HSIs), which are rich in spectral information, have not seen much application of FMs, with existing methods often restricted to specific tasks and lacking generality. To fill this gap, we introduce HyperSIGMA, a vision transformer-based foundation model for HSI interpretation, scalable to over a billion parameters. To tackle the spectral and spatial redundancy challenges in HSIs, we introduce a novel sparse sampling attention (SSA) mechanism, which effectively promotes the learning of diverse contextual features and serves as the basic block of HyperSIGMA. HyperSIGMA integrates spatial and spectral features using a specially designed spectral enhancement module. In addition, we construct a large-scale hyperspectral dataset, HyperGlobal-450K, for pre-training, which contains about 450K hyperspectral images, significantly surpassing existing datasets in scale. Extensive experiments on various high-level and low-level HSI tasks demonstrate HyperSIGMA's versatility and superior representational capability compared to current state-of-the-art methods. Moreover, HyperSIGMA shows significant advantages in scalability, robustness, cross-modal transferring capability, and real-world applicability.

* The code and models will be released at https://github.com/WHU-Sigma/HyperSIGMA

Via

Access Paper or Ask Questions