Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Baoyuan Wu

Decentralized Directed Collaboration for Personalized Federated Learning

May 28, 2024

Yingqi Liu, Yifan Shi, Qinglun Li, Baoyuan Wu, Xueqian Wang, Li Shen

Abstract:Personalized Federated Learning (PFL) is proposed to find the greatest personalized models for each client. To avoid the central failure and communication bottleneck in the server-based FL, we concentrate on the Decentralized Personalized Federated Learning (DPFL) that performs distributed model training in a Peer-to-Peer (P2P) manner. Most personalized works in DPFL are based on undirected and symmetric topologies, however, the data, computation and communication resources heterogeneity result in large variances in the personalized models, which lead the undirected aggregation to suboptimal personalized performance and unguaranteed convergence. To address these issues, we propose a directed collaboration DPFL framework by incorporating stochastic gradient push and partial model personalized, called \textbf{D}ecentralized \textbf{Fed}erated \textbf{P}artial \textbf{G}radient \textbf{P}ush (\textbf{DFedPGP}). It personalizes the linear classifier in the modern deep model to customize the local solution and learns a consensus representation in a fully decentralized manner. Clients only share gradients with a subset of neighbors based on the directed and asymmetric topologies, which guarantees flexible choices for resource efficiency and better convergence. Theoretically, we show that the proposed DFedPGP achieves a superior convergence rate of $\mathcal{O}(\frac{1}{\sqrt{T}})$ in the general non-convex setting, and prove the tighter connectivity among clients will speed up the convergence. The proposed method achieves state-of-the-art (SOTA) accuracy in both data and computation heterogeneity scenarios, demonstrating the efficiency of the directed collaboration and partial gradient push.

* CVPR 2024. arXiv admin note: text overlap with arXiv:2305.15157

Via

Access Paper or Ask Questions

Mitigating Backdoor Attack by Injecting Proactive Defensive Backdoor

May 25, 2024

Shaokui Wei, Hongyuan Zha, Baoyuan Wu

Figure 1 for Mitigating Backdoor Attack by Injecting Proactive Defensive Backdoor

Figure 2 for Mitigating Backdoor Attack by Injecting Proactive Defensive Backdoor

Figure 3 for Mitigating Backdoor Attack by Injecting Proactive Defensive Backdoor

Figure 4 for Mitigating Backdoor Attack by Injecting Proactive Defensive Backdoor

Abstract:Data-poisoning backdoor attacks are serious security threats to machine learning models, where an adversary can manipulate the training dataset to inject backdoors into models. In this paper, we focus on in-training backdoor defense, aiming to train a clean model even when the dataset may be potentially poisoned. Unlike most existing methods that primarily detect and remove/unlearn suspicious samples to mitigate malicious backdoor attacks, we propose a novel defense approach called PDB (Proactive Defensive Backdoor). Specifically, PDB leverages the "home field" advantage of defenders by proactively injecting a defensive backdoor into the model during training. Taking advantage of controlling the training process, the defensive backdoor is designed to suppress the malicious backdoor effectively while remaining secret to attackers. In addition, we introduce a reversible mapping to determine the defensive target label. During inference, PDB embeds a defensive trigger in the inputs and reverses the model's prediction, suppressing malicious backdoor and ensuring the model's utility on the original task. Experimental results across various datasets and models demonstrate that our approach achieves state-of-the-art defense performance against a wide range of backdoor attacks.

* 13 pages, 5 figures and 5 tables

Via

Access Paper or Ask Questions

Fragile Model Watermark for integrity protection: leveraging boundary volatility and sensitive sample-pairing

Apr 11, 2024

ZhenZhe Gao, Zhenjun Tang, Zhaoxia Yin, Baoyuan Wu, Yue Lu

Figure 1 for Fragile Model Watermark for integrity protection: leveraging boundary volatility and sensitive sample-pairing

Figure 2 for Fragile Model Watermark for integrity protection: leveraging boundary volatility and sensitive sample-pairing

Figure 3 for Fragile Model Watermark for integrity protection: leveraging boundary volatility and sensitive sample-pairing

Figure 4 for Fragile Model Watermark for integrity protection: leveraging boundary volatility and sensitive sample-pairing

Abstract:Neural networks have increasingly influenced people's lives. Ensuring the faithful deployment of neural networks as designed by their model owners is crucial, as they may be susceptible to various malicious or unintentional modifications, such as backdooring and poisoning attacks. Fragile model watermarks aim to prevent unexpected tampering that could lead DNN models to make incorrect decisions. They ensure the detection of any tampering with the model as sensitively as possible.However, prior watermarking methods suffered from inefficient sample generation and insufficient sensitivity, limiting their practical applicability. Our approach employs a sample-pairing technique, placing the model boundaries between pairs of samples, while simultaneously maximizing logits. This ensures that the model's decision results of sensitive samples change as much as possible and the Top-1 labels easily alter regardless of the direction it moves.

* The article has been accepted by IEEE International Conference on Multimedia and Expo 2024

Via

Access Paper or Ask Questions

Can ChatGPT Detect DeepFakes? A Study of Using Multimodal Large Language Models for Media Forensics

Mar 26, 2024

Shan Jia, Reilin Lyu, Kangran Zhao, Yize Chen, Zhiyuan Yan, Yan Ju, Chuanbo Hu, Xin Li, Baoyuan Wu, Siwei Lyu

Figure 1 for Can ChatGPT Detect DeepFakes? A Study of Using Multimodal Large Language Models for Media Forensics

Figure 2 for Can ChatGPT Detect DeepFakes? A Study of Using Multimodal Large Language Models for Media Forensics

Figure 3 for Can ChatGPT Detect DeepFakes? A Study of Using Multimodal Large Language Models for Media Forensics

Figure 4 for Can ChatGPT Detect DeepFakes? A Study of Using Multimodal Large Language Models for Media Forensics

Abstract:DeepFakes, which refer to AI-generated media content, have become an increasing concern due to their use as a means for disinformation. Detecting DeepFakes is currently solved with programmed machine learning algorithms. In this work, we investigate the capabilities of multimodal large language models (LLMs) in DeepFake detection. We conducted qualitative and quantitative experiments to demonstrate multimodal LLMs and show that they can expose AI-generated images through careful experimental design and prompt engineering. This is interesting, considering that LLMs are not inherently tailored for media forensic tasks, and the process does not require programming. We discuss the limitations of multimodal LLMs for these tasks and suggest possible improvements.

Via

Access Paper or Ask Questions

Data-Independent Operator: A Training-Free Artifact Representation Extractor for Generalizable Deepfake Detection

Mar 11, 2024

Chuangchuang Tan, Ping Liu, RenShuai Tao, Huan Liu, Yao Zhao, Baoyuan Wu, Yunchao Wei

Figure 1 for Data-Independent Operator: A Training-Free Artifact Representation Extractor for Generalizable Deepfake Detection

Figure 2 for Data-Independent Operator: A Training-Free Artifact Representation Extractor for Generalizable Deepfake Detection

Figure 3 for Data-Independent Operator: A Training-Free Artifact Representation Extractor for Generalizable Deepfake Detection

Figure 4 for Data-Independent Operator: A Training-Free Artifact Representation Extractor for Generalizable Deepfake Detection

Abstract:Recently, the proliferation of increasingly realistic synthetic images generated by various generative adversarial networks has increased the risk of misuse. Consequently, there is a pressing need to develop a generalizable detector for accurately recognizing fake images. The conventional methods rely on generating diverse training sources or large pretrained models. In this work, we show that, on the contrary, the small and training-free filter is sufficient to capture more general artifact representations. Due to its unbias towards both the training and test sources, we define it as Data-Independent Operator (DIO) to achieve appealing improvements on unseen sources. In our framework, handcrafted filters and the randomly-initialized convolutional layer can be used as the training-free artifact representations extractor with excellent results. With the data-independent operator of a popular classifier, such as Resnet50, one could already reach a new state-of-the-art without bells and whistles. We evaluate the effectiveness of the DIO on 33 generation models, even DALLE and Midjourney. Our detector achieves a remarkable improvement of $13.3\%$, establishing a new state-of-the-art performance. The DIO and its extension can serve as strong baselines for future methods. The code is available at \url{https://github.com/chuangchuangtan/Data-Independent-Operator}.

* 12 pages, 3 figures

Via

Access Paper or Ask Questions

Invariant Test-Time Adaptation for Vision-Language Model Generalization

Mar 01, 2024

Huan Ma, Yan Zhu, Changqing Zhang, Peilin Zhao, Baoyuan Wu, Long-Kai Huang, Qinghua Hu, Bingzhe Wu

Figure 1 for Invariant Test-Time Adaptation for Vision-Language Model Generalization

Figure 2 for Invariant Test-Time Adaptation for Vision-Language Model Generalization

Figure 3 for Invariant Test-Time Adaptation for Vision-Language Model Generalization

Figure 4 for Invariant Test-Time Adaptation for Vision-Language Model Generalization

Abstract:Vision-language foundation models have exhibited remarkable success across a multitude of downstream tasks due to their scalability on extensive image-text paired datasets. However, these models display significant limitations when applied to long-tail tasks, such as fine-grained image classification, as a result of "decision shortcuts" that hinders their generalization capabilities. In this work, we find that the CLIP model possesses a rich set of features, encompassing both \textit{desired invariant causal features} and \textit{undesired decision shortcuts}. Moreover, the underperformance of CLIP on downstream tasks originates from its inability to effectively utilize pre-trained features in accordance with specific task requirements. To address this challenge, this paper introduces a test-time prompt tuning paradigm that optimizes a learnable prompt, thereby compelling the model to exploit genuine causal invariant features while disregarding decision shortcuts during the inference phase. The proposed method effectively alleviates excessive dependence on potentially misleading, task-irrelevant contextual information, while concurrently emphasizing critical, task-related visual cues. We conduct comparative analysis of the proposed method against various approaches which validates its effectiveness.

Via

Access Paper or Ask Questions

BackdoorBench: A Comprehensive Benchmark and Analysis of Backdoor Learning

Jan 26, 2024

Baoyuan Wu, Hongrui Chen, Mingda Zhang, Zihao Zhu, Shaokui Wei, Danni Yuan, Mingli Zhu, Ruotong Wang, Li Liu, Chao Shen

Figure 1 for BackdoorBench: A Comprehensive Benchmark and Analysis of Backdoor Learning

Figure 2 for BackdoorBench: A Comprehensive Benchmark and Analysis of Backdoor Learning

Figure 3 for BackdoorBench: A Comprehensive Benchmark and Analysis of Backdoor Learning

Figure 4 for BackdoorBench: A Comprehensive Benchmark and Analysis of Backdoor Learning

Abstract:As an emerging and vital topic for studying deep neural networks' vulnerability (DNNs), backdoor learning has attracted increasing interest in recent years, and many seminal backdoor attack and defense algorithms are being developed successively or concurrently, in the status of a rapid arms race. However, mainly due to the diverse settings, and the difficulties of implementation and reproducibility of existing works, there is a lack of a unified and standardized benchmark of backdoor learning, causing unfair comparisons, and unreliable conclusions (e.g., misleading, biased or even false conclusions). Consequently, it is difficult to evaluate the current progress and design the future development roadmap of this literature. To alleviate this dilemma, we build a comprehensive benchmark of backdoor learning called BackdoorBench. Our benchmark makes three valuable contributions to the research community. 1) We provide an integrated implementation of state-of-the-art (SOTA) backdoor learning algorithms (currently including 16 attack and 27 defense algorithms), based on an extensible modular-based codebase. 2) We conduct comprehensive evaluations of 12 attacks against 16 defenses, with 5 poisoning ratios, based on 4 models and 4 datasets, thus 11,492 pairs of evaluations in total. 3) Based on above evaluations, we present abundant analysis from 8 perspectives via 18 useful analysis tools, and provide several inspiring insights about backdoor learning. We hope that our efforts could build a solid foundation of backdoor learning to facilitate researchers to investigate existing algorithms, develop more innovative algorithms, and explore the intrinsic mechanism of backdoor learning. Finally, we have created a user-friendly website at http://backdoorbench.com, which collects all important information of BackdoorBench, including codebase, docs, leaderboard, and model Zoo.

Via

Access Paper or Ask Questions

Enhanced Few-Shot Class-Incremental Learning via Ensemble Models

Jan 14, 2024

Mingli Zhu, Zihao Zhu, Sihong Chen, Chen Chen, Baoyuan Wu

Figure 1 for Enhanced Few-Shot Class-Incremental Learning via Ensemble Models

Figure 2 for Enhanced Few-Shot Class-Incremental Learning via Ensemble Models

Figure 3 for Enhanced Few-Shot Class-Incremental Learning via Ensemble Models

Figure 4 for Enhanced Few-Shot Class-Incremental Learning via Ensemble Models

Abstract:Few-shot class-incremental learning (FSCIL) aims to continually fit new classes with limited training data, while maintaining the performance of previously learned classes. The main challenges are overfitting the rare new training samples and forgetting old classes. While catastrophic forgetting has been extensively studied, the overfitting problem has attracted less attention in FSCIL. To tackle overfitting challenge, we design a new ensemble model framework cooperated with data augmentation to boost generalization. In this way, the enhanced model works as a library storing abundant features to guarantee fast adaptation to downstream tasks. Specifically, the multi-input multi-output ensemble structure is applied with a spatial-aware data augmentation strategy, aiming at diversifying the feature extractor and alleviating overfitting in incremental sessions. Moreover, self-supervised learning is also integrated to further improve the model generalization. Comprehensive experimental results show that the proposed method can indeed mitigate the overfitting problem in FSCIL, and outperform the state-of-the-art methods.

Via

Access Paper or Ask Questions

Defenses in Adversarial Machine Learning: A Survey

Dec 13, 2023

Baoyuan Wu, Shaokui Wei, Mingli Zhu, Meixi Zheng, Zihao Zhu, Mingda Zhang, Hongrui Chen, Danni Yuan, Li Liu, Qingshan Liu

Figure 1 for Defenses in Adversarial Machine Learning: A Survey

Figure 2 for Defenses in Adversarial Machine Learning: A Survey

Figure 3 for Defenses in Adversarial Machine Learning: A Survey

Figure 4 for Defenses in Adversarial Machine Learning: A Survey

Abstract:Adversarial phenomenon has been widely observed in machine learning (ML) systems, especially in those using deep neural networks, describing that ML systems may produce inconsistent and incomprehensible predictions with humans at some particular cases. This phenomenon poses a serious security threat to the practical application of ML systems, and several advanced attack paradigms have been developed to explore it, mainly including backdoor attacks, weight attacks, and adversarial examples. For each individual attack paradigm, various defense paradigms have been developed to improve the model robustness against the corresponding attack paradigm. However, due to the independence and diversity of these defense paradigms, it is difficult to examine the overall robustness of an ML system against different kinds of attacks.This survey aims to build a systematic review of all existing defense paradigms from a unified perspective. Specifically, from the life-cycle perspective, we factorize a complete machine learning system into five stages, including pre-training, training, post-training, deployment, and inference stages, respectively. Then, we present a clear taxonomy to categorize and review representative defense methods at each individual stage. The unified perspective and presented taxonomies not only facilitate the analysis of the mechanism of each defense paradigm but also help us to understand connections and differences among different defense paradigms, which may inspire future research to develop more advanced, comprehensive defenses.

* 21 pages, 5 figures, 2 tables, 237 reference papers

Via

Access Paper or Ask Questions

Task-Distributionally Robust Data-Free Meta-Learning

Nov 23, 2023

Zixuan Hu, Li Shen, Zhenyi Wang, Yongxian Wei, Baoyuan Wu, Chun Yuan, Dacheng Tao

Figure 1 for Task-Distributionally Robust Data-Free Meta-Learning

Figure 2 for Task-Distributionally Robust Data-Free Meta-Learning

Figure 3 for Task-Distributionally Robust Data-Free Meta-Learning

Figure 4 for Task-Distributionally Robust Data-Free Meta-Learning

Abstract:Data-Free Meta-Learning (DFML) aims to efficiently learn new tasks by leveraging multiple pre-trained models without requiring their original training data. Existing inversion-based DFML methods construct pseudo tasks from a learnable dataset, which is inversely generated from the pre-trained model pool. For the first time, we reveal two major challenges hindering their practical deployments: Task-Distribution Shift (TDS) and Task-Distribution Corruption (TDC). TDS leads to a biased meta-learner because of the skewed task distribution towards newly generated tasks. TDC occurs when untrusted models characterized by misleading labels or poor quality pollute the task distribution. To tackle these issues, we introduce a robust DFML framework that ensures task distributional robustness. We propose to meta-learn from a pseudo task distribution, diversified through task interpolation within a compact task-memory buffer. This approach reduces the meta-learner's overreliance on newly generated tasks by maintaining consistent performance across a broader range of interpolated memory tasks, thus ensuring its generalization for unseen tasks. Additionally, our framework seamlessly incorporates an automated model selection mechanism into the meta-training phase, parameterizing each model's reliability as a learnable weight. This is optimized with a policy gradient algorithm inspired by reinforcement learning, effectively addressing the non-differentiable challenge posed by model selection. Comprehensive experiments across various datasets demonstrate the framework's effectiveness in mitigating TDS and TDC, underscoring its potential to improve DFML in real-world scenarios.

Via

Access Paper or Ask Questions