Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Mingjie Li

SemiDDM-Weather: A Semi-supervised Learning Framework for All-in-one Adverse Weather Removal

Sep 29, 2024

Fang Long, Wenkang Su, Zixuan Li, Lei Cai, Mingjie Li, Yuan-Gen Wang, Xiaochun Cao

Figure 1 for SemiDDM-Weather: A Semi-supervised Learning Framework for All-in-one Adverse Weather Removal

Figure 2 for SemiDDM-Weather: A Semi-supervised Learning Framework for All-in-one Adverse Weather Removal

Figure 3 for SemiDDM-Weather: A Semi-supervised Learning Framework for All-in-one Adverse Weather Removal

Figure 4 for SemiDDM-Weather: A Semi-supervised Learning Framework for All-in-one Adverse Weather Removal

Abstract:Adverse weather removal aims to restore clear vision under adverse weather conditions. Existing methods are mostly tailored for specific weather types and rely heavily on extensive labeled data. In dealing with these two limitations, this paper presents a pioneering semi-supervised all-in-one adverse weather removal framework built on the teacher-student network with a Denoising Diffusion Model (DDM) as the backbone, termed SemiDDM-Weather. As for the design of DDM backbone in our SemiDDM-Weather, we adopt the SOTA Wavelet Diffusion Model-Wavediff with customized inputs and loss functions, devoted to facilitating the learning of many-to-one mapping distributions for efficient all-in-one adverse weather removal with limited label data. To mitigate the risk of misleading model training due to potentially inaccurate pseudo-labels generated by the teacher network in semi-supervised learning, we introduce quality assessment and content consistency constraints to screen the "optimal" outputs from the teacher network as the pseudo-labels, thus more effectively guiding the student network training with unlabeled data. Experimental results show that on both synthetic and real-world datasets, our SemiDDM-Weather consistently delivers high visual quality and superior adverse weather removal, even when compared to fully supervised competitors. Our code and pre-trained model are available at this repository.

Via

Access Paper or Ask Questions

TERD: A Unified Framework for Safeguarding Diffusion Models Against Backdoors

Sep 09, 2024

Yichuan Mo, Hui Huang, Mingjie Li, Ang Li, Yisen Wang

Figure 1 for TERD: A Unified Framework for Safeguarding Diffusion Models Against Backdoors

Figure 2 for TERD: A Unified Framework for Safeguarding Diffusion Models Against Backdoors

Figure 3 for TERD: A Unified Framework for Safeguarding Diffusion Models Against Backdoors

Figure 4 for TERD: A Unified Framework for Safeguarding Diffusion Models Against Backdoors

Abstract:Diffusion models have achieved notable success in image generation, but they remain highly vulnerable to backdoor attacks, which compromise their integrity by producing specific undesirable outputs when presented with a pre-defined trigger. In this paper, we investigate how to protect diffusion models from this dangerous threat. Specifically, we propose TERD, a backdoor defense framework that builds unified modeling for current attacks, which enables us to derive an accessible reversed loss. A trigger reversion strategy is further employed: an initial approximation of the trigger through noise sampled from a prior distribution, followed by refinement through differential multi-step samplers. Additionally, with the reversed trigger, we propose backdoor detection from the noise space, introducing the first backdoor input detection approach for diffusion models and a novel model detection algorithm that calculates the KL divergence between reversed and benign distributions. Extensive evaluations demonstrate that TERD secures a 100% True Positive Rate (TPR) and True Negative Rate (TNR) across datasets of varying resolutions. TERD also demonstrates nice adaptability to other Stochastic Differential Equation (SDE)-based models. Our code is available at https://github.com/PKU-ML/TERD.

* International Conference on Machine Learning 2024

Via

Access Paper or Ask Questions

Medical Report Generation Is A Multi-label Classification Problem

Aug 30, 2024

Yijian Fan, Zhenbang Yang, Rui Liu, Mingjie Li, Xiaojun Chang

Figure 1 for Medical Report Generation Is A Multi-label Classification Problem

Figure 2 for Medical Report Generation Is A Multi-label Classification Problem

Figure 3 for Medical Report Generation Is A Multi-label Classification Problem

Figure 4 for Medical Report Generation Is A Multi-label Classification Problem

Abstract:Medical report generation is a critical task in healthcare that involves the automatic creation of detailed and accurate descriptions from medical images. Traditionally, this task has been approached as a sequence generation problem, relying on vision-and-language techniques to generate coherent and contextually relevant reports. However, in this paper, we propose a novel perspective: rethinking medical report generation as a multi-label classification problem. By framing the task this way, we leverage the radiology nodes from the commonly used knowledge graph, which can be better captured through classification techniques. To verify our argument, we introduce a novel report generation framework based on BLIP integrated with classified key nodes, which allows for effective report generation with accurate classification of multiple key aspects within the medical images. This approach not only simplifies the report generation process but also significantly enhances performance metrics. Our extensive experiments demonstrate that leveraging key nodes can achieve state-of-the-art (SOTA) performance, surpassing existing approaches across two benchmark datasets. The results underscore the potential of re-envisioning traditional tasks with innovative methodologies, paving the way for more efficient and accurate medical report generation.

* Accepted to 2024 IEEE International Conference on Medical Artificial Intelligence

Via

Access Paper or Ask Questions

Image Super-Resolution with Taylor Expansion Approximation and Large Field Reception

Aug 01, 2024

Jiancong Feng, Yuan-Gen Wang, Mingjie Li, Fengchuang Xing

Figure 1 for Image Super-Resolution with Taylor Expansion Approximation and Large Field Reception

Figure 2 for Image Super-Resolution with Taylor Expansion Approximation and Large Field Reception

Figure 3 for Image Super-Resolution with Taylor Expansion Approximation and Large Field Reception

Figure 4 for Image Super-Resolution with Taylor Expansion Approximation and Large Field Reception

Abstract:Self-similarity techniques are booming in blind super-resolution (SR) due to accurate estimation of the degradation types involved in low-resolution images. However, high-dimensional matrix multiplication within self-similarity computation prohibitively consumes massive computational costs. We find that the high-dimensional attention map is derived from the matrix multiplication between Query and Key, followed by a softmax function. This softmax makes the matrix multiplication between Query and Key inseparable, posing a great challenge in simplifying computational complexity. To address this issue, we first propose a second-order Taylor expansion approximation (STEA) to separate the matrix multiplication of Query and Key, resulting in the complexity reduction from $\mathcal{O}(N^2)$ to $\mathcal{O}(N)$. Then, we design a multi-scale large field reception (MLFR) to compensate for the performance degradation caused by STEA. Finally, we apply these two core designs to laboratory and real-world scenarios by constructing LabNet and RealNet, respectively. Extensive experimental results tested on five synthetic datasets demonstrate that our LabNet sets a new benchmark in qualitative and quantitative evaluations. Tested on the RealWorld38 dataset, our RealNet achieves superior visual quality over existing methods. Ablation studies further verify the contributions of STEA and MLFR towards both LabNet and RealNet frameworks.

Via

Access Paper or Ask Questions

Contrastive Learning with Counterfactual Explanations for Radiology Report Generation

Jul 19, 2024

Mingjie Li, Haokun Lin, Liang Qiu, Xiaodan Liang, Ling Chen, Abdulmotaleb Elsaddik, Xiaojun Chang

Figure 1 for Contrastive Learning with Counterfactual Explanations for Radiology Report Generation

Figure 2 for Contrastive Learning with Counterfactual Explanations for Radiology Report Generation

Figure 3 for Contrastive Learning with Counterfactual Explanations for Radiology Report Generation

Figure 4 for Contrastive Learning with Counterfactual Explanations for Radiology Report Generation

Abstract:Due to the common content of anatomy, radiology images with their corresponding reports exhibit high similarity. Such inherent data bias can predispose automatic report generation models to learn entangled and spurious representations resulting in misdiagnostic reports. To tackle these, we propose a novel \textbf{Co}unter\textbf{F}actual \textbf{E}xplanations-based framework (CoFE) for radiology report generation. Counterfactual explanations serve as a potent tool for understanding how decisions made by algorithms can be changed by asking ``what if'' scenarios. By leveraging this concept, CoFE can learn non-spurious visual representations by contrasting the representations between factual and counterfactual images. Specifically, we derive counterfactual images by swapping a patch between positive and negative samples until a predicted diagnosis shift occurs. Here, positive and negative samples are the most semantically similar but have different diagnosis labels. Additionally, CoFE employs a learnable prompt to efficiently fine-tune the pre-trained large language model, encapsulating both factual and counterfactual content to provide a more generalizable prompt representation. Extensive experiments on two benchmarks demonstrate that leveraging the counterfactual explanations enables CoFE to generate semantically coherent and factually complete reports and outperform in terms of language generation and clinical efficacy metrics.

* ECCV 2024

Via

Access Paper or Ask Questions

CLIPVQA:Video Quality Assessment via CLIP

Jul 06, 2024

Fengchuang Xing, Mingjie Li, Yuan-Gen Wang, Guopu Zhu, Xiaochun Cao

Figure 1 for CLIPVQA:Video Quality Assessment via CLIP

Figure 2 for CLIPVQA:Video Quality Assessment via CLIP

Figure 3 for CLIPVQA:Video Quality Assessment via CLIP

Figure 4 for CLIPVQA:Video Quality Assessment via CLIP

Abstract:In learning vision-language representations from web-scale data, the contrastive language-image pre-training (CLIP) mechanism has demonstrated a remarkable performance in many vision tasks. However, its application to the widely studied video quality assessment (VQA) task is still an open issue. In this paper, we propose an efficient and effective CLIP-based Transformer method for the VQA problem (CLIPVQA). Specifically, we first design an effective video frame perception paradigm with the goal of extracting the rich spatiotemporal quality and content information among video frames. Then, the spatiotemporal quality features are adequately integrated together using a self-attention mechanism to yield video-level quality representation. To utilize the quality language descriptions of videos for supervision, we develop a CLIP-based encoder for language embedding, which is then fully aggregated with the generated content information via a cross-attention module for producing video-language representation. Finally, the video-level quality and video-language representations are fused together for final video quality prediction, where a vectorized regression loss is employed for efficient end-to-end optimization. Comprehensive experiments are conducted on eight in-the-wild video datasets with diverse resolutions to evaluate the performance of CLIPVQA. The experimental results show that the proposed CLIPVQA achieves new state-of-the-art VQA performance and up to 37% better generalizability than existing benchmark VQA methods. A series of ablation studies are also performed to validate the effectiveness of each module in CLIPVQA.

Via

Access Paper or Ask Questions

PID: Prompt-Independent Data Protection Against Latent Diffusion Models

Jun 14, 2024

Ang Li, Yichuan Mo, Mingjie Li, Yisen Wang

Abstract:The few-shot fine-tuning of Latent Diffusion Models (LDMs) has enabled them to grasp new concepts from a limited number of images. However, given the vast amount of personal images accessible online, this capability raises critical concerns about civil privacy. While several previous defense methods have been developed to prevent such misuse of LDMs, they typically assume that the textual prompts used by data protectors exactly match those employed by data exploiters. In this paper, we first empirically demonstrate that breaking this assumption, i.e., in cases where discrepancies exist between the textual conditions used by protectors and exploiters, could substantially reduce the effectiveness of these defenses. Furthermore, considering the visual encoder's independence from textual prompts, we delve into the visual encoder and thoroughly investigate how manipulating the visual encoder affects the few-shot fine-tuning process of LDMs. Drawing on these insights, we propose a simple yet effective method called \textbf{Prompt-Independent Defense (PID)} to safeguard privacy against LDMs. We show that PID can act as a strong privacy shield on its own while requiring significantly less computational power. We believe our studies, along with the comprehensive understanding and new defense method, provide a notable advance toward reliable data protection against LDMs.

* 27 pages, ICML 2024 poster

Via

Access Paper or Ask Questions

Teaching with Uncertainty: Unleashing the Potential of Knowledge Distillation in Object Detection

Jun 11, 2024

Junfei Yi, Jianxu Mao, Tengfei Liu, Mingjie Li, Hanyu Gu, Hui Zhang, Xiaojun Chang, Yaonan Wang

Figure 1 for Teaching with Uncertainty: Unleashing the Potential of Knowledge Distillation in Object Detection

Figure 2 for Teaching with Uncertainty: Unleashing the Potential of Knowledge Distillation in Object Detection

Figure 3 for Teaching with Uncertainty: Unleashing the Potential of Knowledge Distillation in Object Detection

Figure 4 for Teaching with Uncertainty: Unleashing the Potential of Knowledge Distillation in Object Detection

Abstract:Knowledge distillation (KD) is a widely adopted and effective method for compressing models in object detection tasks. Particularly, feature-based distillation methods have shown remarkable performance. Existing approaches often ignore the uncertainty in the teacher model's knowledge, which stems from data noise and imperfect training. This limits the student model's ability to learn latent knowledge, as it may overly rely on the teacher's imperfect guidance. In this paper, we propose a novel feature-based distillation paradigm with knowledge uncertainty for object detection, termed "Uncertainty Estimation-Discriminative Knowledge Extraction-Knowledge Transfer (UET)", which can seamlessly integrate with existing distillation methods. By leveraging the Monte Carlo dropout technique, we introduce knowledge uncertainty into the training process of the student model, facilitating deeper exploration of latent knowledge. Our method performs effectively during the KD process without requiring intricate structures or extensive computational resources. Extensive experiments validate the effectiveness of our proposed approach across various distillation strategies, detectors, and backbone architectures. Specifically, following our proposed paradigm, the existing FGD method achieves state-of-the-art (SoTA) performance, with ResNet50-based GFL achieving 44.1% mAP on the COCO dataset, surpassing the baselines by 3.9%.

Via

Access Paper or Ask Questions

Predicting Genetic Mutation from Whole Slide Images via Biomedical-Linguistic Knowledge Enhanced Multi-label Classification

Jun 05, 2024

Gexin Huang, Chenfei Wu, Mingjie Li, Xiaojun Chang, Ling Chen, Ying Sun, Shen Zhao, Xiaodan Liang, Liang Lin

Abstract:Predicting genetic mutations from whole slide images is indispensable for cancer diagnosis. However, existing work training multiple binary classification models faces two challenges: (a) Training multiple binary classifiers is inefficient and would inevitably lead to a class imbalance problem. (b) The biological relationships among genes are overlooked, which limits the prediction performance. To tackle these challenges, we innovatively design a Biological-knowledge enhanced PathGenomic multi-label Transformer to improve genetic mutation prediction performances. BPGT first establishes a novel gene encoder that constructs gene priors by two carefully designed modules: (a) A gene graph whose node features are the genes' linguistic descriptions and the cancer phenotype, with edges modeled by genes' pathway associations and mutation consistencies. (b) A knowledge association module that fuses linguistic and biomedical knowledge into gene priors by transformer-based graph representation learning, capturing the intrinsic relationships between different genes' mutations. BPGT then designs a label decoder that finally performs genetic mutation prediction by two tailored modules: (a) A modality fusion module that firstly fuses the gene priors with critical regions in WSIs and obtains gene-wise mutation logits. (b) A comparative multi-label loss that emphasizes the inherent comparisons among mutation status to enhance the discrimination capabilities. Sufficient experiments on The Cancer Genome Atlas benchmark demonstrate that BPGT outperforms the state-of-the-art.

* 16 pages, 8 figures, and 3 tables

Via

Access Paper or Ask Questions

Black-box Adversarial Attacks Against Image Quality Assessment Models

Feb 28, 2024

Yu Ran, Ao-Xiang Zhang, Mingjie Li, Weixuan Tang, Yuan-Gen Wang

Figure 1 for Black-box Adversarial Attacks Against Image Quality Assessment Models

Figure 2 for Black-box Adversarial Attacks Against Image Quality Assessment Models

Figure 3 for Black-box Adversarial Attacks Against Image Quality Assessment Models

Figure 4 for Black-box Adversarial Attacks Against Image Quality Assessment Models

Abstract:The goal of No-Reference Image Quality Assessment (NR-IQA) is to predict the perceptual quality of an image in line with its subjective evaluation. To put the NR-IQA models into practice, it is essential to study their potential loopholes for model refinement. This paper makes the first attempt to explore the black-box adversarial attacks on NR-IQA models. Specifically, we first formulate the attack problem as maximizing the deviation between the estimated quality scores of original and perturbed images, while restricting the perturbed image distortions for visual quality preservation. Under such formulation, we then design a Bi-directional loss function to mislead the estimated quality scores of adversarial examples towards an opposite direction with maximum deviation. On this basis, we finally develop an efficient and effective black-box attack method against NR-IQA models. Extensive experiments reveal that all the evaluated NR-IQA models are vulnerable to the proposed attack method. And the generated perturbations are not transferable, enabling them to serve the investigation of specialities of disparate IQA models.

Via

Access Paper or Ask Questions