Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Lei Feng

Southeast University

Defending Multimodal Backdoored Models by Repulsive Visual Prompt Tuning

Dec 29, 2024

Zhifang Zhang, Shuo He, Bingquan Shen, Lei Feng

Figure 1 for Defending Multimodal Backdoored Models by Repulsive Visual Prompt Tuning

Figure 2 for Defending Multimodal Backdoored Models by Repulsive Visual Prompt Tuning

Figure 3 for Defending Multimodal Backdoored Models by Repulsive Visual Prompt Tuning

Figure 4 for Defending Multimodal Backdoored Models by Repulsive Visual Prompt Tuning

Abstract:Multimodal contrastive learning models (e.g., CLIP) can learn high-quality representations from large-scale image-text datasets, yet they exhibit significant vulnerabilities to backdoor attacks, raising serious safety concerns. In this paper, we disclose that CLIP's vulnerabilities primarily stem from its excessive encoding of class-irrelevant features, which can compromise the model's visual feature resistivity to input perturbations, making it more susceptible to capturing the trigger patterns inserted by backdoor attacks. Inspired by this finding, we propose Repulsive Visual Prompt Tuning (RVPT), a novel defense approach that employs specially designed deep visual prompt tuning and feature-repelling loss to eliminate excessive class-irrelevant features while simultaneously optimizing cross-entropy loss to maintain clean accuracy. Unlike existing multimodal backdoor defense methods that typically require the availability of poisoned data or involve fine-tuning the entire model, RVPT leverages few-shot downstream clean samples and only tunes a small number of parameters. Empirical results demonstrate that RVPT tunes only 0.27\% of the parameters relative to CLIP, yet it significantly outperforms state-of-the-art baselines, reducing the attack success rate from 67.53\% to 2.76\% against SoTA attacks and effectively generalizing its defensive capabilities across multiple datasets.

Via

Access Paper or Ask Questions

Rethinking Chain-of-Thought from the Perspective of Self-Training

Dec 14, 2024

Zongqian Wu, Baoduo Xu, Ruochen Cui, Mengmeng Zhan, Xiaofeng Zhu, Lei Feng

Abstract:Chain-of-thought (CoT) reasoning has emerged as an effective approach for activating latent capabilities in large language models (LLMs). We observe that CoT shares significant similarities with self-training in terms of their learning processes. Motivated by these parallels, this paper explores the underlying relationship between CoT and self-training, demonstrating how insights from self-training can enhance CoT performance. Specifically, our study first reveals that CoT, like self-training, follows the principle of semantic entropy minimization. Leveraging this insight, we propose a novel CoT framework that incorporates two key components: (i) a task-specific prompt module designed to guide LLMs in generating high-quality initial reasoning processes, and (ii) an adaptive reasoning iteration module for progressively refining the reasoning process.

* 16 pages, 12 figures

Via

Access Paper or Ask Questions

Dual-Head Knowledge Distillation: Enhancing Logits Utilization with an Auxiliary Head

Nov 13, 2024

Penghui Yang, Chen-Chen Zong, Sheng-Jun Huang, Lei Feng, Bo An

Figure 1 for Dual-Head Knowledge Distillation: Enhancing Logits Utilization with an Auxiliary Head

Figure 2 for Dual-Head Knowledge Distillation: Enhancing Logits Utilization with an Auxiliary Head

Figure 3 for Dual-Head Knowledge Distillation: Enhancing Logits Utilization with an Auxiliary Head

Figure 4 for Dual-Head Knowledge Distillation: Enhancing Logits Utilization with an Auxiliary Head

Abstract:Traditional knowledge distillation focuses on aligning the student's predicted probabilities with both ground-truth labels and the teacher's predicted probabilities. However, the transition to predicted probabilities from logits would obscure certain indispensable information. To address this issue, it is intuitive to additionally introduce a logit-level loss function as a supplement to the widely used probability-level loss function, for exploiting the latent information of logits. Unfortunately, we empirically find that the amalgamation of the newly introduced logit-level loss and the previous probability-level loss will lead to performance degeneration, even trailing behind the performance of employing either loss in isolation. We attribute this phenomenon to the collapse of the classification head, which is verified by our theoretical analysis based on the neural collapse theory. Specifically, the gradients of the two loss functions exhibit contradictions in the linear classifier yet display no such conflict within the backbone. Drawing from the theoretical analysis, we propose a novel method called dual-head knowledge distillation, which partitions the linear classifier into two classification heads responsible for different losses, thereby preserving the beneficial effects of both losses on the backbone while eliminating adverse influences on the classification head. Extensive experiments validate that our method can effectively exploit the information inside the logits and achieve superior performance against state-of-the-art counterparts.

* Preprint

Via

Access Paper or Ask Questions

ELU-GCN: Effectively Label-Utilizing Graph Convolutional Network

Nov 04, 2024

Jincheng Huang, Yujie Mo, Xiaoshuang Shi, Lei Feng, Xiaofeng Zhu

Figure 1 for ELU-GCN: Effectively Label-Utilizing Graph Convolutional Network

Figure 2 for ELU-GCN: Effectively Label-Utilizing Graph Convolutional Network

Figure 3 for ELU-GCN: Effectively Label-Utilizing Graph Convolutional Network

Figure 4 for ELU-GCN: Effectively Label-Utilizing Graph Convolutional Network

Abstract:The message-passing mechanism of graph convolutional networks (i.e., GCNs) enables label information to be propagated to a broader range of neighbors, thereby increasing the utilization of labels. However, the label information is not always effectively utilized in the traditional GCN framework. To address this issue, we propose a new two-step framework called ELU-GCN. In the first stage, ELU-GCN conducts graph learning to learn a new graph structure (\ie ELU-graph), which enables GCNs to effectively utilize label information. In the second stage, we design a new graph contrastive learning on the GCN framework for representation learning by exploring the consistency and mutually exclusive information between the learned ELU graph and the original graph. Moreover, we theoretically demonstrate that the proposed method can ensure the generalization ability of GCNs. Extensive experiments validate the superiority of the proposed method.

Via

Access Paper or Ask Questions

Realtime Particulate Matter and Bacteria Analysis of Peritoneal Dialysis Fluid using Digital Inline Holography

Nov 01, 2024

Nicholas Bravo-Frank, Nicolas Mesyngier, Lei Feng, Jiarong Hong

Figure 1 for Realtime Particulate Matter and Bacteria Analysis of Peritoneal Dialysis Fluid using Digital Inline Holography

Figure 2 for Realtime Particulate Matter and Bacteria Analysis of Peritoneal Dialysis Fluid using Digital Inline Holography

Figure 3 for Realtime Particulate Matter and Bacteria Analysis of Peritoneal Dialysis Fluid using Digital Inline Holography

Figure 4 for Realtime Particulate Matter and Bacteria Analysis of Peritoneal Dialysis Fluid using Digital Inline Holography

Abstract:We developed a digital inline holography (DIH) system integrated with deep learning algorithms for real-time detection of particulate matter (PM) and bacterial contamination in peritoneal dialysis (PD) fluids. The system comprises a microfluidic sample delivery module and a DIH imaging module that captures holograms using a pulsed laser and a digital camera with a 40x objective. Our data processing pipeline enhances holograms, reconstructs images, and employs a YOLOv8n-based deep learning model for particle identification and classification, trained on labeled holograms of generic PD particles, Escherichia coli (E. coli), and Pseudomonas aeruginosa (P. aeruginosa). The system effectively detected and classified generic particles in sterile PD fluids, revealing diverse morphologies predominantly sized 1-5 um with an average concentration of 61 particles per microliter. In PD fluid samples spiked with high concentrations of E. coli and P. aeruginosa, our system achieved high sensitivity in detecting and classifying these bacteria at clinically relevant low false positive rates. Further validation against standard colony-forming unit (CFU) methods using PD fluid spiked with bacterial concentrations from approximately 100 to 10,000 bacteria per milliliter demonstrated a clear one-to-one correspondence between our measurements and CFU counts. Our DIH system provides a rapid, accurate alternative to traditional culture-based methods for assessing bacterial contamination in PD fluids. By enabling real-time sterility monitoring, it can significantly improve patient outcomes in PD treatment, facilitate point-of-care fluid production, reduce logistical challenges, and be extended to quality control in pharmaceutical production.

* 16 pages, 5 figures

Via

Access Paper or Ask Questions

Bayesian-guided Label Mapping for Visual Reprogramming

Oct 31, 2024

Chengyi Cai, Zesheng Ye, Lei Feng, Jianzhong Qi, Feng Liu

Figure 1 for Bayesian-guided Label Mapping for Visual Reprogramming

Figure 2 for Bayesian-guided Label Mapping for Visual Reprogramming

Figure 3 for Bayesian-guided Label Mapping for Visual Reprogramming

Figure 4 for Bayesian-guided Label Mapping for Visual Reprogramming

Abstract:Visual reprogramming (VR) leverages the intrinsic capabilities of pretrained vision models by adapting their input or output interfaces to solve downstream tasks whose labels (i.e., downstream labels) might be totally different from the labels associated with the pretrained models (i.e., pretrained labels). When adapting the output interface, label mapping methods transform the pretrained labels to downstream labels by establishing a gradient-free one-to-one correspondence between the two sets of labels. However, in this paper, we reveal that one-to-one mappings may overlook the complex relationship between pretrained and downstream labels. Motivated by this observation, we propose a Bayesian-guided Label Mapping (BLM) method. BLM constructs an iteratively-updated probabilistic label mapping matrix, with each element quantifying a pairwise relationship between pretrained and downstream labels. The assignment of values to the constructed matrix is guided by Bayesian conditional probability, considering the joint distribution of the downstream labels and the labels predicted by the pretrained model on downstream samples. Experiments conducted on both pretrained vision models (e.g., ResNeXt) and vision-language models (e.g., CLIP) demonstrate the superior performance of BLM over existing label mapping methods. The success of BLM also offers a probabilistic lens through which to understand and analyze the effectiveness of VR. Our code is available at https://github.com/tmlr-group/BayesianLM.

Via

Access Paper or Ask Questions

Generative AI Enabled Matching for 6G Multiple Access

Oct 29, 2024

Xudong Wang, Hongyang Du, Dusit Niyato, Lijie Zhou, Lei Feng, Zhixiang Yang, Fanqin Zhou, Wenjing Li

Abstract:In wireless networks, applying deep learning models to solve matching problems between different entities has become a mainstream and effective approach. However, the complex network topology in 6G multiple access presents significant challenges for the real-time performance and stability of matching generation. Generative artificial intelligence (GenAI) has demonstrated strong capabilities in graph feature extraction, exploration, and generation, offering potential for graph-structured matching generation. In this paper, we propose a GenAI-enabled matching generation framework to support 6G multiple access. Specifically, we first summarize the classical matching theory, discuss common GenAI models and applications from the perspective of matching generation. Then, we propose a framework based on generative diffusion models (GDMs) that iteratively denoises toward reward maximization to generate a matching strategy that meets specific requirements. Experimental results show that, compared to decision-based AI approaches, our framework can generate more effective matching strategies based on given conditions and predefined rewards, helping to solve complex problems in 6G multiple access, such as task allocation.

* 8 pages,5 figures

Via

Access Paper or Ask Questions

Prototype-based Optimal Transport for Out-of-Distribution Detection

Oct 10, 2024

Ao Ke, Wenlong Chen, Chuanwen Feng, Yukun Cao, Xike Xie, S. Kevin Zhou, Lei Feng

Figure 1 for Prototype-based Optimal Transport for Out-of-Distribution Detection

Figure 2 for Prototype-based Optimal Transport for Out-of-Distribution Detection

Figure 3 for Prototype-based Optimal Transport for Out-of-Distribution Detection

Figure 4 for Prototype-based Optimal Transport for Out-of-Distribution Detection

Abstract:Detecting Out-of-Distribution (OOD) inputs is crucial for improving the reliability of deep neural networks in the real-world deployment. In this paper, inspired by the inherent distribution shift between ID and OOD data, we propose a novel method that leverages optimal transport to measure the distribution discrepancy between test inputs and ID prototypes. The resulting transport costs are used to quantify the individual contribution of each test input to the overall discrepancy, serving as a desirable measure for OOD detection. To address the issue that solely relying on the transport costs to ID prototypes is inadequate for identifying OOD inputs closer to ID data, we generate virtual outliers to approximate the OOD region via linear extrapolation. By combining the transport costs to ID prototypes with the costs to virtual outliers, the detection of OOD data near ID data is emphasized, thereby enhancing the distinction between ID and OOD inputs. Experiments demonstrate the superiority of our method over state-of-the-art methods.

Via

Access Paper or Ask Questions

Automatic Dataset Construction (ADC): Sample Collection, Data Curation, and Beyond

Aug 21, 2024

Minghao Liu, Zonglin Di, Jiaheng Wei, Zhongruo Wang, Hengxiang Zhang, Ruixuan Xiao, Haoyu Wang, Jinlong Pang, Hao Chen, Ankit Shah(+8 more)

Figure 1 for Automatic Dataset Construction (ADC): Sample Collection, Data Curation, and Beyond

Figure 2 for Automatic Dataset Construction (ADC): Sample Collection, Data Curation, and Beyond

Figure 3 for Automatic Dataset Construction (ADC): Sample Collection, Data Curation, and Beyond

Figure 4 for Automatic Dataset Construction (ADC): Sample Collection, Data Curation, and Beyond

Abstract:Large-scale data collection is essential for developing personalized training data, mitigating the shortage of training data, and fine-tuning specialized models. However, creating high-quality datasets quickly and accurately remains a challenge due to annotation errors, the substantial time and costs associated with human labor. To address these issues, we propose Automatic Dataset Construction (ADC), an innovative methodology that automates dataset creation with negligible cost and high efficiency. Taking the image classification task as a starting point, ADC leverages LLMs for the detailed class design and code generation to collect relevant samples via search engines, significantly reducing the need for manual annotation and speeding up the data generation process. Despite these advantages, ADC also encounters real-world challenges such as label errors (label noise) and imbalanced data distributions (label bias). We provide open-source software that incorporates existing methods for label error detection, robust learning under noisy and biased data, ensuring a higher-quality training data and more robust model training procedure. Furthermore, we design three benchmark datasets focused on label noise detection, label noise learning, and class-imbalanced learning. These datasets are vital because there are few existing datasets specifically for label noise detection, despite its importance. Finally, we evaluate the performance of existing popular methods on these datasets, thereby facilitating further research in the field.

Via

Access Paper or Ask Questions

CommonUppRoad: A Framework of Formal Modelling, Verifying, Learning, and Visualisation of Autonomous Vehicles

Aug 02, 2024

Rong Gu, Kaige Tan, Andreas Holck Høeg-Petersen, Lei Feng, Kim Guldstrand Larsen

Figure 1 for CommonUppRoad: A Framework of Formal Modelling, Verifying, Learning, and Visualisation of Autonomous Vehicles

Figure 2 for CommonUppRoad: A Framework of Formal Modelling, Verifying, Learning, and Visualisation of Autonomous Vehicles

Figure 3 for CommonUppRoad: A Framework of Formal Modelling, Verifying, Learning, and Visualisation of Autonomous Vehicles

Figure 4 for CommonUppRoad: A Framework of Formal Modelling, Verifying, Learning, and Visualisation of Autonomous Vehicles

Abstract:Combining machine learning and formal methods (FMs) provides a possible solution to overcome the safety issue of autonomous driving (AD) vehicles. However, there are gaps to be bridged before this combination becomes practically applicable and useful. In an attempt to facilitate researchers in both FMs and AD areas, this paper proposes a framework that combines two well-known tools, namely CommonRoad and UPPAAL. On the one hand, CommonRoad can be enhanced by the rigorous semantics of models in UPPAAL, which enables a systematic and comprehensive understanding of the AD system's behaviour and thus strengthens the safety of the system. On the other hand, controllers synthesised by UPPAAL can be visualised by CommonRoad in real-world road networks, which facilitates AD vehicle designers greatly adopting formal models in system design. In this framework, we provide automatic model conversions between CommonRoad and UPPAAL. Therefore, users only need to program in Python and the framework takes care of the formal models, learning, and verification in the backend. We perform experiments to demonstrate the applicability of our framework in various AD scenarios, discuss the advantages of solving motion planning in our framework, and show the scalability limit and possible solutions.

* 20 pages, 5 figures, ISoLA 2024

Via

Access Paper or Ask Questions