Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jie Yang

Efficient Pretraining Model based on Multi-Scale Local Visual Field Feature Reconstruction for PCB CT Image Element Segmentation

May 09, 2024

Chen Chen, Kai Qiao, Jie Yang, Jian Chen, Bin Yan

Figure 1 for Efficient Pretraining Model based on Multi-Scale Local Visual Field Feature Reconstruction for PCB CT Image Element Segmentation

Figure 2 for Efficient Pretraining Model based on Multi-Scale Local Visual Field Feature Reconstruction for PCB CT Image Element Segmentation

Figure 3 for Efficient Pretraining Model based on Multi-Scale Local Visual Field Feature Reconstruction for PCB CT Image Element Segmentation

Figure 4 for Efficient Pretraining Model based on Multi-Scale Local Visual Field Feature Reconstruction for PCB CT Image Element Segmentation

Abstract:Element segmentation is a key step in nondestructive testing of Printed Circuit Boards (PCB) based on Computed Tomography (CT) technology. In recent years, the rapid development of self-supervised pretraining technology can obtain general image features without labeled samples, and then use a small amount of labeled samples to solve downstream tasks, which has a good potential in PCB element segmentation. At present, Masked Image Modeling (MIM) pretraining model has been initially applied in PCB CT image element segmentation. However, due to the small and regular size of PCB elements such as vias, wires, and pads, the global visual field has redundancy for a single element reconstruction, which may damage the performance of the model. Based on this issue, we propose an efficient pretraining model based on multi-scale local visual field feature reconstruction for PCB CT image element segmentation (EMLR-seg). In this model, the teacher-guided MIM pretraining model is introduced into PCB CT image element segmentation for the first time, and a multi-scale local visual field extraction (MVE) module is proposed to reduce redundancy by focusing on local visual fields. At the same time, a simple 4-Transformer-blocks decoder is used. Experiments show that EMLR-seg can achieve 88.6% mIoU on the PCB CT image dataset we proposed, which exceeds 1.2% by the baseline model, and the training time is reduced by 29.6 hours, a reduction of 17.4% under the same experimental condition, which reflects the advantage of EMLR-seg in terms of performance and efficiency.

Via

Access Paper or Ask Questions

Fourier Transform-based Wavenumber Domain 3D Imaging in RIS-aided Communication Systems

Apr 07, 2024

Yixuan Huang, Jie Yang, Wankai Tang, Chao-Kai Wen, Shi Jin

Figure 1 for Fourier Transform-based Wavenumber Domain 3D Imaging in RIS-aided Communication Systems

Figure 2 for Fourier Transform-based Wavenumber Domain 3D Imaging in RIS-aided Communication Systems

Figure 3 for Fourier Transform-based Wavenumber Domain 3D Imaging in RIS-aided Communication Systems

Figure 4 for Fourier Transform-based Wavenumber Domain 3D Imaging in RIS-aided Communication Systems

Abstract:Radio imaging is rapidly gaining prominence in the design of future communication systems, with the potential to utilize reconfigurable intelligent surfaces (RISs) as imaging apertures. Although the sparsity of targets in three-dimensional (3D) space has led most research to adopt compressed sensing (CS)-based imaging algorithms, these often require substantial computational and memory burdens. Drawing inspiration from conventional Fourier transform (FT)-based imaging methods, our research seeks to accelerate radio imaging in RIS-aided communication systems. To begin, we introduce a two-stage wavenumber domain 3D imaging technique: first, we modify RIS phase shifts to recover the equivalent channel response from the user equipment to the RIS array, subsequently employing traditional FT-based wavenumber domain methods to produce target images. We also determine the diffraction resolution limits of the system through k-space analysis, taking into account factors including system bandwidth, transmission direction, operating frequency, and the angle subtended by the RIS. Addressing the challenge of limited pilots in communication systems, we unveil an innovative algorithm that merges the strengths of both FT- and CS-based techniques by substituting the expansive sensing matrix with FT-based operators. Our simulation outcomes confirm that our proposed FT-based methods achieve high-quality images while demanding few time, memory, and communication resources.

* 16 pages, 11 figures, submitted to IEEE for possible publication

Via

Access Paper or Ask Questions

Exploring LLMs as a Source of Targeted Synthetic Textual Data to Minimize High Confidence Misclassifications

Apr 02, 2024

Philip Lippmann, Matthijs T. J. Spaan, Jie Yang

Figure 1 for Exploring LLMs as a Source of Targeted Synthetic Textual Data to Minimize High Confidence Misclassifications

Figure 2 for Exploring LLMs as a Source of Targeted Synthetic Textual Data to Minimize High Confidence Misclassifications

Figure 3 for Exploring LLMs as a Source of Targeted Synthetic Textual Data to Minimize High Confidence Misclassifications

Figure 4 for Exploring LLMs as a Source of Targeted Synthetic Textual Data to Minimize High Confidence Misclassifications

Abstract:Natural Language Processing (NLP) models optimized for predictive performance often make high confidence errors and suffer from vulnerability to adversarial and out-of-distribution data. Existing work has mainly focused on mitigation of such errors using either humans or an automated approach. In this study, we explore the usage of large language models (LLMs) for data augmentation as a potential solution to the issue of NLP models making wrong predictions with high confidence during classification tasks. We compare the effectiveness of synthetic data generated by LLMs with that of human data obtained via the same procedure. For mitigation, humans or LLMs provide natural language characterizations of high confidence misclassifications to generate synthetic data, which are then used to extend the training set. We conduct an extensive evaluation of our approach on three classification tasks and demonstrate its effectiveness in reducing the number of high confidence misclassifications present in the model, all while maintaining the same level of accuracy. Moreover, we find that the cost gap between humans and LLMs surpasses an order of magnitude, as LLMs attain human-like performance while being more scalable.

Via

Access Paper or Ask Questions

RIS-aided Single-frequency 3D Imaging by Exploiting Multi-view Image Correlations

Mar 18, 2024

Yixuan Huang, Jie Yang, Chao-Kai Wen, Shi Jin

Figure 1 for RIS-aided Single-frequency 3D Imaging by Exploiting Multi-view Image Correlations

Figure 2 for RIS-aided Single-frequency 3D Imaging by Exploiting Multi-view Image Correlations

Figure 3 for RIS-aided Single-frequency 3D Imaging by Exploiting Multi-view Image Correlations

Figure 4 for RIS-aided Single-frequency 3D Imaging by Exploiting Multi-view Image Correlations

Abstract:Retrieving range information in three-dimensional (3D) radio imaging is particularly challenging due to the limited communication bandwidth and pilot resources. To address this issue, we consider a reconfigurable intelligent surface (RIS)-aided uplink communication scenario, generating multiple measurements through RIS phase adjustment. This study successfully realizes 3D single-frequency imaging by exploiting the near-field multi-view image correlations deduced from user mobility. We first highlight the significance of considering anisotropy in multi-view image formation by investigating radar cross-section properties and diffraction resolution limits. We then propose a novel model for joint multi-view 3D imaging that incorporates occlusion effects and anisotropic scattering. These factors lead to slow image support variation and smooth coefficient evolution, which are mathematically modeled as Markov processes. Based on this model, we employ the Expectation Maximization-Turbo-Generalized Approximate Message Passing algorithm for joint multi-view single-frequency 3D imaging with limited measurements. Simulation results reveal the superiority of joint multi-view imaging in terms of enhanced imaging ranges, accuracies, and anisotropy characterization compared to single-view imaging. Combining adjacent observations for joint multi-view imaging enables a reduction in the measurement overhead by 80%.

* 16 pages, 12 figures, accepted by IEEE Transactions on Communications

Via

Access Paper or Ask Questions

Recent Advances in 3D Gaussian Splatting

Mar 17, 2024

Tong Wu, Yu-Jie Yuan, Ling-Xiao Zhang, Jie Yang, Yan-Pei Cao, Ling-Qi Yan, Lin Gao

Figure 1 for Recent Advances in 3D Gaussian Splatting

Figure 2 for Recent Advances in 3D Gaussian Splatting

Figure 3 for Recent Advances in 3D Gaussian Splatting

Figure 4 for Recent Advances in 3D Gaussian Splatting

Abstract:The emergence of 3D Gaussian Splatting (3DGS) has greatly accelerated the rendering speed of novel view synthesis. Unlike neural implicit representations like Neural Radiance Fields (NeRF) that represent a 3D scene with position and viewpoint-conditioned neural networks, 3D Gaussian Splatting utilizes a set of Gaussian ellipsoids to model the scene so that efficient rendering can be accomplished by rasterizing Gaussian ellipsoids into images. Apart from the fast rendering speed, the explicit representation of 3D Gaussian Splatting facilitates editing tasks like dynamic reconstruction, geometry editing, and physical simulation. Considering the rapid change and growing number of works in this field, we present a literature review of recent 3D Gaussian Splatting methods, which can be roughly classified into 3D reconstruction, 3D editing, and other downstream applications by functionality. Traditional point-based rendering methods and the rendering formulation of 3D Gaussian Splatting are also illustrated for a better understanding of this technique. This survey aims to help beginners get into this field quickly and provide experienced researchers with a comprehensive overview, which can stimulate the future development of the 3D Gaussian Splatting representation.

Via

Access Paper or Ask Questions

Generation is better than Modification: Combating High Class Homophily Variance in Graph Anomaly Detection

Mar 15, 2024

Rui Zhang, Dawei Cheng, Xin Liu, Jie Yang, Yi Ouyang, Xian Wu, Yefeng Zheng

Figure 1 for Generation is better than Modification: Combating High Class Homophily Variance in Graph Anomaly Detection

Figure 2 for Generation is better than Modification: Combating High Class Homophily Variance in Graph Anomaly Detection

Figure 3 for Generation is better than Modification: Combating High Class Homophily Variance in Graph Anomaly Detection

Figure 4 for Generation is better than Modification: Combating High Class Homophily Variance in Graph Anomaly Detection

Abstract:Graph-based anomaly detection is currently an important research topic in the field of graph neural networks (GNNs). We find that in graph anomaly detection, the homophily distribution differences between different classes are significantly greater than those in homophilic and heterophilic graphs. For the first time, we introduce a new metric called Class Homophily Variance, which quantitatively describes this phenomenon. To mitigate its impact, we propose a novel GNN model named Homophily Edge Generation Graph Neural Network (HedGe). Previous works typically focused on pruning, selecting or connecting on original relationships, and we refer to these methods as modifications. Different from these works, our method emphasizes generating new relationships with low class homophily variance, using the original relationships as an auxiliary. HedGe samples homophily adjacency matrices from scratch using a self-attention mechanism, and leverages nodes that are relevant in the feature space but not directly connected in the original graph. Additionally, we modify the loss function to punish the generation of unnecessary heterophilic edges by the model. Extensive comparison experiments demonstrate that HedGe achieved the best performance across multiple benchmark datasets, including anomaly detection and edgeless node classification. The proposed model also improves the robustness under the novel Heterophily Attack with increased class homophily variance on other graph classification tasks.

Via

Access Paper or Ask Questions

Intelligent Reflecting Surfaces vs. Full-Duplex Relays: A Comparison in the Air

Mar 14, 2024

Qian Ding, Jie Yang, Yang Luo, Chunbo Luo

Abstract:This letter aims to provide a fundamental analytical comparison for the two major types of relaying methods: intelligent reflecting surfaces and full-duplex relays, particularly focusing on unmanned aerial vehicle communication scenarios. Both amplify-and-forward and decode-and-forward relaying schemes are included in the comparison. In addition, optimal 3D UAV deployment and minimum transmit power under the quality of service constraint are derived. Our numerical results show that IRSs of medium size exhibit comparable performance to AF relays, meanwhile outperforming DF relays under extremely large surface size and high data rates.

* IEEE Communications Letters, vol. 28, no. 2, pp. 397-401, Feb. 2024

Via

Access Paper or Ask Questions

Guiding Clinical Reasoning with Large Language Models via Knowledge Seeds

Mar 11, 2024

Jiageng WU, Xian Wu, Jie Yang

Abstract:Clinical reasoning refers to the cognitive process that physicians employ in evaluating and managing patients. This process typically involves suggesting necessary examinations, diagnosing patients' diseases, and deciding on appropriate therapies, etc. Accurate clinical reasoning requires extensive medical knowledge and rich clinical experience, setting a high bar for physicians. This is particularly challenging in developing countries due to the overwhelming number of patients and limited physician resources, contributing significantly to global health inequity and necessitating automated clinical reasoning approaches. Recently, the emergence of large language models (LLMs) such as ChatGPT and GPT-4 have demonstrated their potential in clinical reasoning. However, these LLMs are prone to hallucination problems, and the reasoning process of LLMs may not align with the clinical decision path of physicians. In this study, we introduce a novel framework, In-Context Padding (ICP), designed to enhance LLMs with medical knowledge. Specifically, we infer critical clinical reasoning elements (referred to as knowledge seeds) and use these as anchors to guide the generation process of LLMs. Experiments on two clinical question datasets demonstrate that ICP significantly improves the clinical reasoning ability of LLMs.

Via

Access Paper or Ask Questions

MedKP: Medical Dialogue with Knowledge Enhancement and Clinical Pathway Encoding

Mar 11, 2024

Jiageng Wu, Xian Wu, Yefeng Zheng, Jie Yang

Figure 1 for MedKP: Medical Dialogue with Knowledge Enhancement and Clinical Pathway Encoding

Figure 2 for MedKP: Medical Dialogue with Knowledge Enhancement and Clinical Pathway Encoding

Figure 3 for MedKP: Medical Dialogue with Knowledge Enhancement and Clinical Pathway Encoding

Figure 4 for MedKP: Medical Dialogue with Knowledge Enhancement and Clinical Pathway Encoding

Abstract:With appropriate data selection and training techniques, Large Language Models (LLMs) have demonstrated exceptional success in various medical examinations and multiple-choice questions. However, the application of LLMs in medical dialogue generation-a task more closely aligned with actual medical practice-has been less explored. This gap is attributed to the insufficient medical knowledge of LLMs, which leads to inaccuracies and hallucinated information in the generated medical responses. In this work, we introduce the Medical dialogue with Knowledge enhancement and clinical Pathway encoding (MedKP) framework, which integrates an external knowledge enhancement module through a medical knowledge graph and an internal clinical pathway encoding via medical entities and physician actions. Evaluated with comprehensive metrics, our experiments on two large-scale, real-world online medical consultation datasets (MedDG and KaMed) demonstrate that MedKP surpasses multiple baselines and mitigates the incidence of hallucinations, achieving a new state-of-the-art. Extensive ablation studies further reveal the effectiveness of each component of MedKP. This enhancement advances the development of reliable, automated medical consultation responses using LLMs, thereby broadening the potential accessibility of precise and real-time medical assistance.

Via

Access Paper or Ask Questions

RESTORE: Towards Feature Shift for Vision-Language Prompt Learning

Mar 10, 2024

Yuncheng Yang, Chuyan Zhang, Zuopeng Yang, Yuting Gao, Yulei Qin, Ke Li, Xing Sun, Jie Yang, Yun Gu

Figure 1 for RESTORE: Towards Feature Shift for Vision-Language Prompt Learning

Figure 2 for RESTORE: Towards Feature Shift for Vision-Language Prompt Learning

Figure 3 for RESTORE: Towards Feature Shift for Vision-Language Prompt Learning

Figure 4 for RESTORE: Towards Feature Shift for Vision-Language Prompt Learning

Abstract:Prompt learning is effective for fine-tuning foundation models to improve their generalization across a variety of downstream tasks. However, the prompts that are independently optimized along a single modality path, may sacrifice the vision-language alignment of pre-trained models in return for improved performance on specific tasks and classes, leading to poorer generalization. In this paper, we first demonstrate that prompt tuning along only one single branch of CLIP (e.g., language or vision) is the reason why the misalignment occurs. Without proper regularization across the learnable parameters in different modalities, prompt learning violates the original pre-training constraints inherent in the two-tower architecture. To address such misalignment, we first propose feature shift, which is defined as the variation of embeddings after introducing the learned prompts, to serve as an explanatory tool. We dive into its relation with generalizability and thereafter propose RESTORE, a multi-modal prompt learning method that exerts explicit constraints on cross-modal consistency. To be more specific, to prevent feature misalignment, a feature shift consistency is introduced to synchronize inter-modal feature shifts by measuring and regularizing the magnitude of discrepancy during prompt tuning. In addition, we propose a "surgery" block to avoid short-cut hacking, where cross-modal misalignment can still be severe if the feature shift of each modality varies drastically at the same rate. It is implemented as feed-forward adapters upon both modalities to alleviate the misalignment problem. Extensive experiments on 15 datasets demonstrate that our method outperforms the state-of-the-art prompt tuning methods without compromising feature alignment.

* 18 pages, 5 figures

Via

Access Paper or Ask Questions