Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Haohan Wang

Conflict-Aware Adversarial Training

Oct 21, 2024

Zhiyu Xue, Haohan Wang, Yao Qin, Ramtin Pedarsani

Abstract:Adversarial training is the most effective method to obtain adversarial robustness for deep neural networks by directly involving adversarial samples in the training procedure. To obtain an accurate and robust model, the weighted-average method is applied to optimize standard loss and adversarial loss simultaneously. In this paper, we argue that the weighted-average method does not provide the best tradeoff for the standard performance and adversarial robustness. We argue that the failure of the weighted-average method is due to the conflict between the gradients derived from standard and adversarial loss, and further demonstrate such a conflict increases with attack budget theoretically and practically. To alleviate this problem, we propose a new trade-off paradigm for adversarial training with a conflict-aware factor for the convex combination of standard and adversarial loss, named \textbf{Conflict-Aware Adversarial Training~(CA-AT)}. Comprehensive experimental results show that CA-AT consistently offers a superior trade-off between standard performance and adversarial robustness under the settings of adversarial training from scratch and parameter-efficient finetuning.

Via

Access Paper or Ask Questions

DistDD: Distributed Data Distillation Aggregation through Gradient Matching

Oct 11, 2024

Peiran Wang, Haohan Wang

Abstract:In this paper, we introduce DistDD, a novel approach within the federated learning framework that reduces the need for repetitive communication by distilling data directly on clients' devices. Unlike traditional federated learning that requires iterative model updates across nodes, DistDD facilitates a one-time distillation process that extracts a global distilled dataset, maintaining the privacy standards of federated learning while significantly cutting down communication costs. By leveraging the DistDD's distilled dataset, the developers of the FL can achieve just-in-time parameter tuning and neural architecture search over FL without repeating the whole FL process multiple times. We provide a detailed convergence proof of the DistDD algorithm, reinforcing its mathematical stability and reliability for practical applications. Our experiments demonstrate the effectiveness and robustness of DistDD, particularly in non-i.i.d. and mislabeled data scenarios, showcasing its potential to handle complex real-world data challenges distinctively from conventional federated learning methods. We also evaluate DistDD's application in the use case and prove its effectiveness and communication-savings in the NAS use case.

Via

Access Paper or Ask Questions

Simple Unsupervised Knowledge Distillation With Space Similarity

Sep 20, 2024

Aditya Singh, Haohan Wang

Figure 1 for Simple Unsupervised Knowledge Distillation With Space Similarity

Figure 2 for Simple Unsupervised Knowledge Distillation With Space Similarity

Figure 3 for Simple Unsupervised Knowledge Distillation With Space Similarity

Figure 4 for Simple Unsupervised Knowledge Distillation With Space Similarity

Abstract:As per recent studies, Self-supervised learning (SSL) does not readily extend to smaller architectures. One direction to mitigate this shortcoming while simultaneously training a smaller network without labels is to adopt unsupervised knowledge distillation (UKD). Existing UKD approaches handcraft preservation worthy inter/intra sample relationships between the teacher and its student. However, this may overlook/ignore other key relationships present in the mapping of a teacher. In this paper, instead of heuristically constructing preservation worthy relationships between samples, we directly motivate the student to model the teacher's embedding manifold. If the mapped manifold is similar, all inter/intra sample relationships are indirectly conserved. We first demonstrate that prior methods cannot preserve teacher's latent manifold due to their sole reliance on $L_2$ normalised embedding features. Subsequently, we propose a simple objective to capture the lost information due to normalisation. Our proposed loss component, termed \textbf{space similarity}, motivates each dimension of a student's feature space to be similar to the corresponding dimension of its teacher. We perform extensive experiments demonstrating strong performance of our proposed approach on various benchmarks.

Via

Access Paper or Ask Questions

DS-ViT: Dual-Stream Vision Transformer for Cross-Task Distillation in Alzheimer's Early Diagnosis

Sep 11, 2024

Ke Chen, Yifeng Wang, Yufei Zhou, Haohan Wang

Figure 1 for DS-ViT: Dual-Stream Vision Transformer for Cross-Task Distillation in Alzheimer's Early Diagnosis

Figure 2 for DS-ViT: Dual-Stream Vision Transformer for Cross-Task Distillation in Alzheimer's Early Diagnosis

Figure 3 for DS-ViT: Dual-Stream Vision Transformer for Cross-Task Distillation in Alzheimer's Early Diagnosis

Figure 4 for DS-ViT: Dual-Stream Vision Transformer for Cross-Task Distillation in Alzheimer's Early Diagnosis

Abstract:In the field of Alzheimer's disease diagnosis, segmentation and classification tasks are inherently interconnected. Sharing knowledge between models for these tasks can significantly improve training efficiency, particularly when training data is scarce. However, traditional knowledge distillation techniques often struggle to bridge the gap between segmentation and classification due to the distinct nature of tasks and different model architectures. To address this challenge, we propose a dual-stream pipeline that facilitates cross-task and cross-architecture knowledge sharing. Our approach introduces a dual-stream embedding module that unifies feature representations from segmentation and classification models, enabling dimensional integration of these features to guide the classification model. We validated our method on multiple 3D datasets for Alzheimer's disease diagnosis, demonstrating significant improvements in classification performance, especially on small datasets. Furthermore, we extended our pipeline with a residual temporal attention mechanism for early diagnosis, utilizing images taken before the atrophy of patients' brain mass. This advancement shows promise in enabling diagnosis approximately six months earlier in mild and asymptomatic stages, offering critical time for intervention.

* 8 pages, 3 figures, 3 tables

Via

Access Paper or Ask Questions

A Quantitative Approach for Evaluating Disease Focus and Interpretability of Deep Learning Models for Alzheimer's Disease Classification

Sep 07, 2024

Thomas Yu Chow Tam, Litian Liang, Ke Chen, Haohan Wang, Wei Wu

Figure 1 for A Quantitative Approach for Evaluating Disease Focus and Interpretability of Deep Learning Models for Alzheimer's Disease Classification

Figure 2 for A Quantitative Approach for Evaluating Disease Focus and Interpretability of Deep Learning Models for Alzheimer's Disease Classification

Figure 3 for A Quantitative Approach for Evaluating Disease Focus and Interpretability of Deep Learning Models for Alzheimer's Disease Classification

Figure 4 for A Quantitative Approach for Evaluating Disease Focus and Interpretability of Deep Learning Models for Alzheimer's Disease Classification

Abstract:Deep learning (DL) models have shown significant potential in Alzheimer's Disease (AD) classification. However, understanding and interpreting these models remains challenging, which hinders the adoption of these models in clinical practice. Techniques such as saliency maps have been proven effective in providing visual and empirical clues about how these models work, but there still remains a gap in understanding which specific brain regions DL models focus on and whether these brain regions are pathologically associated with AD. To bridge such gap, in this study, we developed a quantitative disease-focusing strategy to first enhance the interpretability of DL models using saliency maps and brain segmentations; then we propose a disease-focus (DF) score that quantifies how much a DL model focuses on brain areas relevant to AD pathology based on clinically known MRI-based pathological regions of AD. Using this strategy, we compared several state-of-the-art DL models, including a baseline 3D ResNet model, a pretrained MedicalNet model, and a MedicalNet with data augmentation to classify patients with AD vs. cognitive normal patients using MRI data; then we evaluated these models in terms of their abilities to focus on disease-relevant regions. Our results show interesting disease-focusing patterns with different models, particularly characteristic patterns with the pretrained models and data augmentation, and also provide insight into their classification performance. These results suggest that the approach we developed for quantitatively assessing the abilities of DL models to focus on disease-relevant regions may help improve interpretability of these models for AD classification and facilitate their adoption for AD diagnosis in clinical practice. The code is publicly available at https://github.com/Liang-lt/ADNI.

Via

Access Paper or Ask Questions

Towards Reliable Advertising Image Generation Using Human Feedback

Aug 01, 2024

Zhenbang Du, Wei Feng, Haohan Wang, Yaoyu Li, Jingsen Wang, Jian Li, Zheng Zhang, Jingjing Lv, Xin Zhu, Junsheng Jin(+3 more)

Figure 1 for Towards Reliable Advertising Image Generation Using Human Feedback

Figure 2 for Towards Reliable Advertising Image Generation Using Human Feedback

Figure 3 for Towards Reliable Advertising Image Generation Using Human Feedback

Figure 4 for Towards Reliable Advertising Image Generation Using Human Feedback

Abstract:In the e-commerce realm, compelling advertising images are pivotal for attracting customer attention. While generative models automate image generation, they often produce substandard images that may mislead customers and require significant labor costs to inspect. This paper delves into increasing the rate of available generated images. We first introduce a multi-modal Reliable Feedback Network (RFNet) to automatically inspect the generated images. Combining the RFNet into a recurrent process, Recurrent Generation, results in a higher number of available advertising images. To further enhance production efficiency, we fine-tune diffusion models with an innovative Consistent Condition regularization utilizing the feedback from RFNet (RFFT). This results in a remarkable increase in the available rate of generated images, reducing the number of attempts in Recurrent Generation, and providing a highly efficient production process without sacrificing visual appeal. We also construct a Reliable Feedback 1 Million (RF1M) dataset which comprises over one million generated advertising images annotated by human, which helps to train RFNet to accurately assess the availability of generated images and faithfully reflect the human feedback. Generally speaking, our approach offers a reliable solution for advertising image generation.

* ECCV2024

Via

Access Paper or Ask Questions

Quantitative Evaluation of the Saliency Map for Alzheimer's Disease Classifier with Anatomical Segmentation

Jul 11, 2024

Yihan Zhang, Xuanshuo Zhang, Wei Wu, Haohan Wang

Figure 1 for Quantitative Evaluation of the Saliency Map for Alzheimer's Disease Classifier with Anatomical Segmentation

Figure 2 for Quantitative Evaluation of the Saliency Map for Alzheimer's Disease Classifier with Anatomical Segmentation

Figure 3 for Quantitative Evaluation of the Saliency Map for Alzheimer's Disease Classifier with Anatomical Segmentation

Abstract:Saliency maps have been widely used to interpret deep learning classifiers for Alzheimer's disease (AD). However, since AD is heterogeneous and has multiple subtypes, the pathological mechanism of AD remains not fully understood and may vary from patient to patient. Due to the lack of such understanding, it is difficult to comprehensively and effectively assess the saliency map of AD classifier. In this paper, we utilize the anatomical segmentation to allocate saliency values into different brain regions. By plotting the distributions of saliency maps corresponding to AD and NC (Normal Control), we can gain a comprehensive view of the model's decisions process. In order to leverage the fact that the brain volume shrinkage happens in AD patients during disease progression, we define a new evaluation metric, brain volume change score (VCS), by computing the average Pearson correlation of the brain volume changes and the saliency values of a model in different brain regions for each patient. Thus, the VCS metric can help us gain some knowledge of how saliency maps resulting from different models relate to the changes of the volumes across different regions in the whole brain. We trained candidate models on the ADNI dataset and tested on three different datasets. Our results indicate: (i) models with higher VCSs tend to demonstrate saliency maps with more details relevant to the AD pathology, (ii) using gradient-based adversarial training strategies such as FGSM and stochastic masking can improve the VCSs of the models.

Via

Access Paper or Ask Questions

JailbreakZoo: Survey, Landscapes, and Horizons in Jailbreaking Large Language and Vision-Language Models

Jun 26, 2024

Haibo Jin, Leyang Hu, Xinuo Li, Peiyan Zhang, Chonghan Chen, Jun Zhuang, Haohan Wang

Figure 1 for JailbreakZoo: Survey, Landscapes, and Horizons in Jailbreaking Large Language and Vision-Language Models

Figure 2 for JailbreakZoo: Survey, Landscapes, and Horizons in Jailbreaking Large Language and Vision-Language Models

Figure 3 for JailbreakZoo: Survey, Landscapes, and Horizons in Jailbreaking Large Language and Vision-Language Models

Figure 4 for JailbreakZoo: Survey, Landscapes, and Horizons in Jailbreaking Large Language and Vision-Language Models

Abstract:The rapid evolution of artificial intelligence (AI) through developments in Large Language Models (LLMs) and Vision-Language Models (VLMs) has brought significant advancements across various technological domains. While these models enhance capabilities in natural language processing and visual interactive tasks, their growing adoption raises critical concerns regarding security and ethical alignment. This survey provides an extensive review of the emerging field of jailbreaking--deliberately circumventing the ethical and operational boundaries of LLMs and VLMs--and the consequent development of defense mechanisms. Our study categorizes jailbreaks into seven distinct types and elaborates on defense strategies that address these vulnerabilities. Through this comprehensive examination, we identify research gaps and propose directions for future studies to enhance the security frameworks of LLMs and VLMs. Our findings underscore the necessity for a unified perspective that integrates both jailbreak strategies and defensive solutions to foster a robust, secure, and reliable environment for the next generation of language models. More details can be found on our website: \url{https://chonghan-chen.com/llm-jailbreak-zoo-survey/}.

* 44 pages

Via

Access Paper or Ask Questions

GenoTEX: A Benchmark for Evaluating LLM-Based Exploration of Gene Expression Data in Alignment with Bioinformaticians

Jun 21, 2024

Haoyang Liu, Haohan Wang

Figure 1 for GenoTEX: A Benchmark for Evaluating LLM-Based Exploration of Gene Expression Data in Alignment with Bioinformaticians

Figure 2 for GenoTEX: A Benchmark for Evaluating LLM-Based Exploration of Gene Expression Data in Alignment with Bioinformaticians

Figure 3 for GenoTEX: A Benchmark for Evaluating LLM-Based Exploration of Gene Expression Data in Alignment with Bioinformaticians

Figure 4 for GenoTEX: A Benchmark for Evaluating LLM-Based Exploration of Gene Expression Data in Alignment with Bioinformaticians

Abstract:Recent advancements in machine learning have significantly improved the identification of disease-associated genes from gene expression datasets. However, these processes often require extensive expertise and manual effort, limiting their scalability. Large Language Model (LLM)-based agents have shown promise in automating these tasks due to their increasing problem-solving abilities. To support the evaluation and development of such methods, we introduce GenoTEX, a benchmark dataset for the automatic exploration of gene expression data, involving the tasks of dataset selection, preprocessing, and statistical analysis. GenoTEX provides annotated code and results for solving a wide range of gene identification problems, in a full analysis pipeline that follows the standard of computational genomics. These annotations are curated by human bioinformaticians who carefully analyze the datasets to ensure accuracy and reliability. To provide baselines for these tasks, we present GenoAgents, a team of LLM-based agents designed with context-aware planning, iterative correction, and domain expert consultation to collaboratively explore gene datasets. Our experiments with GenoAgents demonstrate the potential of LLM-based approaches in genomics data analysis, while error analysis highlights the challenges and areas for future improvement. We propose GenoTEX as a promising resource for benchmarking and enhancing AI-driven methods for genomics data analysis. We make our benchmark publicly available at \url{https://github.com/Liu-Hy/GenoTex}.

* 25 pages, 3 figures

Via

Access Paper or Ask Questions

Deconstructing The Ethics of Large Language Models from Long-standing Issues to New-emerging Dilemmas

Jun 08, 2024

Chengyuan Deng, Yiqun Duan, Xin Jin, Heng Chang, Yijun Tian, Han Liu, Henry Peng Zou, Yiqiao Jin, Yijia Xiao, Yichen Wang(+7 more)

Figure 1 for Deconstructing The Ethics of Large Language Models from Long-standing Issues to New-emerging Dilemmas

Figure 2 for Deconstructing The Ethics of Large Language Models from Long-standing Issues to New-emerging Dilemmas

Figure 3 for Deconstructing The Ethics of Large Language Models from Long-standing Issues to New-emerging Dilemmas

Abstract:Large Language Models (LLMs) have achieved unparalleled success across diverse language modeling tasks in recent years. However, this progress has also intensified ethical concerns, impacting the deployment of LLMs in everyday contexts. This paper provides a comprehensive survey of ethical challenges associated with LLMs, from longstanding issues such as copyright infringement, systematic bias, and data privacy, to emerging problems like truthfulness and social norms. We critically analyze existing research aimed at understanding, examining, and mitigating these ethical risks. Our survey underscores integrating ethical standards and societal values into the development of LLMs, thereby guiding the development of responsible and ethically aligned language models.

Via

Access Paper or Ask Questions