Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jingfeng Zhang

Accurate Forgetting for Heterogeneous Federated Continual Learning

Feb 20, 2025

Abudukelimu Wuerkaixi, Sen Cui, Jingfeng Zhang, Kunda Yan, Bo Han, Gang Niu, Lei Fang, Changshui Zhang, Masashi Sugiyama

Figure 1 for Accurate Forgetting for Heterogeneous Federated Continual Learning

Figure 2 for Accurate Forgetting for Heterogeneous Federated Continual Learning

Figure 3 for Accurate Forgetting for Heterogeneous Federated Continual Learning

Figure 4 for Accurate Forgetting for Heterogeneous Federated Continual Learning

Abstract:Recent years have witnessed a burgeoning interest in federated learning (FL). However, the contexts in which clients engage in sequential learning remain under-explored. Bridging FL and continual learning (CL) gives rise to a challenging practical problem: federated continual learning (FCL). Existing research in FCL primarily focuses on mitigating the catastrophic forgetting issue of continual learning while collaborating with other clients. We argue that the forgetting phenomena are not invariably detrimental. In this paper, we consider a more practical and challenging FCL setting characterized by potentially unrelated or even antagonistic data/tasks across different clients. In the FL scenario, statistical heterogeneity and data noise among clients may exhibit spurious correlations which result in biased feature learning. While existing CL strategies focus on a complete utilization of previous knowledge, we found that forgetting biased information is beneficial in our study. Therefore, we propose a new concept accurate forgetting (AF) and develop a novel generative-replay method~\method~which selectively utilizes previous knowledge in federated networks. We employ a probabilistic framework based on a normalizing flow model to quantify the credibility of previous knowledge. Comprehensive experiments affirm the superiority of our method over baselines.

* published in ICLR 2024

Via

Access Paper or Ask Questions

ACE++: Instruction-Based Image Creation and Editing via Context-Aware Content Filling

Jan 07, 2025

Chaojie Mao, Jingfeng Zhang, Yulin Pan, Zeyinzi Jiang, Zhen Han, Yu Liu, Jingren Zhou

Figure 1 for ACE++: Instruction-Based Image Creation and Editing via Context-Aware Content Filling

Figure 2 for ACE++: Instruction-Based Image Creation and Editing via Context-Aware Content Filling

Figure 3 for ACE++: Instruction-Based Image Creation and Editing via Context-Aware Content Filling

Figure 4 for ACE++: Instruction-Based Image Creation and Editing via Context-Aware Content Filling

Abstract:We report ACE++, an instruction-based diffusion framework that tackles various image generation and editing tasks. Inspired by the input format for the inpainting task proposed by FLUX.1-Fill-dev, we improve the Long-context Condition Unit (LCU) introduced in ACE and extend this input paradigm to any editing and generation tasks. To take full advantage of image generative priors, we develop a two-stage training scheme to minimize the efforts of finetuning powerful text-to-image diffusion models like FLUX.1-dev. In the first stage, we pre-train the model using task data with the 0-ref tasks from the text-to-image model. There are many models in the community based on the post-training of text-to-image foundational models that meet this training paradigm of the first stage. For example, FLUX.1-Fill-dev deals primarily with painting tasks and can be used as an initialization to accelerate the training process. In the second stage, we finetune the above model to support the general instructions using all tasks defined in ACE. To promote the widespread application of ACE++ in different scenarios, we provide a comprehensive set of models that cover both full finetuning and lightweight finetuning, while considering general applicability and applicability in vertical scenarios. The qualitative analysis showcases the superiority of ACE++ in terms of generating image quality and prompt following ability. Code and models will be available on the project page: https://ali-vilab. github.io/ACE_plus_page/.

Via

Access Paper or Ask Questions

Dissecting Misalignment of Multimodal Large Language Models via Influence Function

Nov 18, 2024

Lijie Hu, Chenyang Ren, Huanyi Xie, Khouloud Saadi, Shu Yang, Jingfeng Zhang, Di Wang

Abstract:Multi-modal Large Language models (MLLMs) are always trained on data from diverse and unreliable sources, which may contain misaligned or mislabeled text-image pairs. This frequently causes robustness issues and hallucinations, leading to performance degradation. Data valuation is an efficient way to detect and trace these misalignments. Nevertheless, existing methods are computationally expensive for MLLMs. While computationally efficient, the classical influence functions are inadequate for contrastive learning models because they were originally designed for pointwise loss. Additionally, contrastive learning involves minimizing the distance between the modalities of positive samples and maximizing the distance between the modalities of negative samples. This requires us to evaluate the influence of samples from both perspectives. To tackle these challenges, we introduce the Extended Influence Function for Contrastive Loss (ECIF), an influence function crafted for contrastive loss. ECIF considers both positive and negative samples and provides a closed-form approximation of contrastive learning models, eliminating the need for retraining. Building upon ECIF, we develop a series of algorithms for data evaluation in MLLM, misalignment detection, and misprediction trace-back tasks. Experimental results demonstrate our ECIF advances the transparency and interpretability of MLLMs by offering a more accurate assessment of data impact and model alignment compared to traditional baseline methods.

* 34 pages

Via

Access Paper or Ask Questions

ColorEdit: Training-free Image-Guided Color editing with diffusion model

Nov 15, 2024

Xingxi Yin, Zhi Li, Jingfeng Zhang, Chenglin Li, Yin Zhang

Figure 1 for ColorEdit: Training-free Image-Guided Color editing with diffusion model

Figure 2 for ColorEdit: Training-free Image-Guided Color editing with diffusion model

Figure 3 for ColorEdit: Training-free Image-Guided Color editing with diffusion model

Figure 4 for ColorEdit: Training-free Image-Guided Color editing with diffusion model

Abstract:Text-to-image (T2I) diffusion models, with their impressive generative capabilities, have been adopted for image editing tasks, demonstrating remarkable efficacy. However, due to attention leakage and collision between the cross-attention map of the object and the new color attribute from the text prompt, text-guided image editing methods may fail to change the color of an object, resulting in a misalignment between the resulting image and the text prompt. In this paper, we conduct an in-depth analysis on the process of text-guided image synthesizing and what semantic information different cross-attention blocks have learned. We observe that the visual representation of an object is determined in the up-block of the diffusion model in the early stage of the denoising process, and color adjustment can be achieved through value matrices alignment in the cross-attention layer. Based on our findings, we propose a straightforward, yet stable, and effective image-guided method to modify the color of an object without requiring any additional fine-tuning or training. Lastly, we present a benchmark dataset called COLORBENCH, the first benchmark to evaluate the performance of color change methods. Extensive experiments validate the effectiveness of our method in object-level color editing and surpass the performance of popular text-guided image editing approaches in both synthesized and real images.

Via

Access Paper or Ask Questions

An Individual Identity-Driven Framework for Animal Re-Identification

Oct 30, 2024

Yihao Wu, Di Zhao, Jingfeng Zhang, Yun Sing Koh

Figure 1 for An Individual Identity-Driven Framework for Animal Re-Identification

Figure 2 for An Individual Identity-Driven Framework for Animal Re-Identification

Figure 3 for An Individual Identity-Driven Framework for Animal Re-Identification

Figure 4 for An Individual Identity-Driven Framework for Animal Re-Identification

Abstract:Reliable re-identification of individuals within large wildlife populations is crucial for biological studies, ecological research, and wildlife conservation. Classic computer vision techniques offer a promising direction for Animal Re-identification (Animal ReID), but their backbones' close-set nature limits their applicability and generalizability. Despite the demonstrated effectiveness of vision-language models like CLIP in re-identifying persons and vehicles, their application to Animal ReID remains limited due to unique challenges, such as the various visual representations of animals, including variations in poses and forms. To address these limitations, we leverage CLIP's cross-modal capabilities to introduce a two-stage framework, the \textbf{Indiv}idual \textbf{A}nimal \textbf{ID}entity-Driven (IndivAID) framework, specifically designed for Animal ReID. In the first stage, IndivAID trains a text description generator by extracting individual semantic information from each image, generating both image-specific and individual-specific textual descriptions that fully capture the diverse visual concepts of each individual across animal images. In the second stage, IndivAID refines its learning of visual concepts by dynamically incorporating individual-specific textual descriptions with an integrated attention module to further highlight discriminative features of individuals for Animal ReID. Evaluation against state-of-the-art methods across eight benchmark datasets and a real-world Stoat dataset demonstrates IndivAID's effectiveness and applicability. Code is available at \url{https://github.com/ywu840/IndivAID}.

* 10 pages

Via

Access Paper or Ask Questions

Towards Multi-dimensional Explanation Alignment for Medical Classification

Oct 28, 2024

Lijie Hu, Songning Lai, Wenshuo Chen, Hongru Xiao, Hongbin Lin, Lu Yu, Jingfeng Zhang, Di Wang

Figure 1 for Towards Multi-dimensional Explanation Alignment for Medical Classification

Figure 2 for Towards Multi-dimensional Explanation Alignment for Medical Classification

Figure 3 for Towards Multi-dimensional Explanation Alignment for Medical Classification

Figure 4 for Towards Multi-dimensional Explanation Alignment for Medical Classification

Abstract:The lack of interpretability in the field of medical image analysis has significant ethical and legal implications. Existing interpretable methods in this domain encounter several challenges, including dependency on specific models, difficulties in understanding and visualization, as well as issues related to efficiency. To address these limitations, we propose a novel framework called Med-MICN (Medical Multi-dimensional Interpretable Concept Network). Med-MICN provides interpretability alignment for various angles, including neural symbolic reasoning, concept semantics, and saliency maps, which are superior to current interpretable methods. Its advantages include high prediction accuracy, interpretability across multiple dimensions, and automation through an end-to-end concept labeling process that reduces the need for extensive human training effort when working with new datasets. To demonstrate the effectiveness and interpretability of Med-MICN, we apply it to four benchmark datasets and compare it with baselines. The results clearly demonstrate the superior performance and interpretability of our Med-MICN.

* Accepted by NeurIPS 2024

Via

Access Paper or Ask Questions

ChatLogic: Integrating Logic Programming with Large Language Models for Multi-Step Reasoning

Jul 14, 2024

Zhongsheng Wang, Jiamou Liu, Qiming Bao, Hongfei Rong, Jingfeng Zhang

Figure 1 for ChatLogic: Integrating Logic Programming with Large Language Models for Multi-Step Reasoning

Figure 2 for ChatLogic: Integrating Logic Programming with Large Language Models for Multi-Step Reasoning

Figure 3 for ChatLogic: Integrating Logic Programming with Large Language Models for Multi-Step Reasoning

Figure 4 for ChatLogic: Integrating Logic Programming with Large Language Models for Multi-Step Reasoning

Abstract:Large language models (LLMs) such as ChatGPT and GPT-4 have demonstrated impressive capabilities in various generative tasks. However, their performance is often hampered by limitations in accessing and leveraging long-term memory, leading to specific vulnerabilities and biases, especially during long interactions. This paper introduces ChatLogic, an innovative framework specifically targeted at LLM reasoning tasks that can enhance the performance of LLMs in multi-step deductive reasoning tasks by integrating logic programming. In ChatLogic, the language model plays a central role, acting as a controller and participating in every system operation stage. We propose a novel method of converting logic problems into symbolic integration with an inference engine. This approach leverages large language models' situational understanding and imitation skills and uses symbolic memory to enhance multi-step deductive reasoning capabilities. Our results show that the ChatLogic framework significantly improves the multi-step reasoning capabilities of LLMs. The source code and data are available at \url{https://github.com/Strong-AI-Lab/ChatLogic}

* 8 pages, 3 figures. This paper has been accepted by WCCI IJCNN 2024

Via

Access Paper or Ask Questions

Privacy-Preserving Heterogeneous Federated Learning for Sensitive Healthcare Data

Jun 15, 2024

Yukai Xu, Jingfeng Zhang, Yujie Gu

Figure 1 for Privacy-Preserving Heterogeneous Federated Learning for Sensitive Healthcare Data

Figure 2 for Privacy-Preserving Heterogeneous Federated Learning for Sensitive Healthcare Data

Figure 3 for Privacy-Preserving Heterogeneous Federated Learning for Sensitive Healthcare Data

Abstract:In the realm of healthcare where decentralized facilities are prevalent, machine learning faces two major challenges concerning the protection of data and models. The data-level challenge concerns the data privacy leakage when centralizing data with sensitive personal information. While the model-level challenge arises from the heterogeneity of local models, which need to be collaboratively trained while ensuring their confidentiality to address intellectual property concerns. To tackle these challenges, we propose a new framework termed Abstention-Aware Federated Voting (AAFV) that can collaboratively and confidentially train heterogeneous local models while simultaneously protecting the data privacy. This is achieved by integrating a novel abstention-aware voting mechanism and a differential privacy mechanism onto local models' predictions. In particular, the proposed abstention-aware voting mechanism exploits a threshold-based abstention method to select high-confidence votes from heterogeneous local models, which not only enhances the learning utility but also protects model confidentiality. Furthermore, we implement AAFV on two practical prediction tasks of diabetes and in-hospital patient mortality. The experiments demonstrate the effectiveness and confidentiality of AAFV in testing accuracy and privacy protection.

* Accepted to the 2024 IEEE Conference on Artificial Intelligence (IEEE CAI 2024)

Via

Access Paper or Ask Questions

Improving Accuracy-robustness Trade-off via Pixel Reweighted Adversarial Training

Jun 02, 2024

Jiacheng Zhang, Feng Liu, Dawei Zhou, Jingfeng Zhang, Tongliang Liu

Figure 1 for Improving Accuracy-robustness Trade-off via Pixel Reweighted Adversarial Training

Figure 2 for Improving Accuracy-robustness Trade-off via Pixel Reweighted Adversarial Training

Figure 3 for Improving Accuracy-robustness Trade-off via Pixel Reweighted Adversarial Training

Figure 4 for Improving Accuracy-robustness Trade-off via Pixel Reweighted Adversarial Training

Abstract:Adversarial training (AT) trains models using adversarial examples (AEs), which are natural images modified with specific perturbations to mislead the model. These perturbations are constrained by a predefined perturbation budget $\epsilon$ and are equally applied to each pixel within an image. However, in this paper, we discover that not all pixels contribute equally to the accuracy on AEs (i.e., robustness) and accuracy on natural images (i.e., accuracy). Motivated by this finding, we propose Pixel-reweighted AdveRsarial Training (PART), a new framework that partially reduces $\epsilon$ for less influential pixels, guiding the model to focus more on key regions that affect its outputs. Specifically, we first use class activation mapping (CAM) methods to identify important pixel regions, then we keep the perturbation budget for these regions while lowering it for the remaining regions when generating AEs. In the end, we use these pixel-reweighted AEs to train a model. PART achieves a notable improvement in accuracy without compromising robustness on CIFAR-10, SVHN and TinyImagenet-200, justifying the necessity to allocate distinct weights to different pixel regions in robust classification.

Via

Access Paper or Ask Questions

Text Guided Image Editing with Automatic Concept Locating and Forgetting

May 30, 2024

Jia Li, Lijie Hu, Zhixian He, Jingfeng Zhang, Tianhang Zheng, Di Wang

Figure 1 for Text Guided Image Editing with Automatic Concept Locating and Forgetting

Figure 2 for Text Guided Image Editing with Automatic Concept Locating and Forgetting

Figure 3 for Text Guided Image Editing with Automatic Concept Locating and Forgetting

Figure 4 for Text Guided Image Editing with Automatic Concept Locating and Forgetting

Abstract:With the advancement of image-to-image diffusion models guided by text, significant progress has been made in image editing. However, a persistent challenge remains in seamlessly incorporating objects into images based on textual instructions, without relying on extra user-provided guidance. Text and images are inherently distinct modalities, bringing out difficulties in fully capturing the semantic intent conveyed through language and accurately translating that into the desired visual modifications. Therefore, text-guided image editing models often produce generations with residual object attributes that do not fully align with human expectations. To address this challenge, the models should comprehend the image content effectively away from a disconnect between the provided textual editing prompts and the actual modifications made to the image. In our paper, we propose a novel method called Locate and Forget (LaF), which effectively locates potential target concepts in the image for modification by comparing the syntactic trees of the target prompt and scene descriptions in the input image, intending to forget their existence clues in the generated image. Compared to the baselines, our method demonstrates its superiority in text-guided image editing tasks both qualitatively and quantitatively.

Via

Access Paper or Ask Questions