Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yu-Gang Jiang

Fudan University

Domain Expansion and Boundary Growth for Open-Set Single-Source Domain Generalization

Nov 05, 2024

Pengkun Jiao, Na Zhao, Jingjing Chen, Yu-Gang Jiang

Figure 1 for Domain Expansion and Boundary Growth for Open-Set Single-Source Domain Generalization

Figure 2 for Domain Expansion and Boundary Growth for Open-Set Single-Source Domain Generalization

Figure 3 for Domain Expansion and Boundary Growth for Open-Set Single-Source Domain Generalization

Figure 4 for Domain Expansion and Boundary Growth for Open-Set Single-Source Domain Generalization

Abstract:Open-set single-source domain generalization aims to use a single-source domain to learn a robust model that can be generalized to unknown target domains with both domain shifts and label shifts. The scarcity of the source domain and the unknown data distribution of the target domain pose a great challenge for domain-invariant feature learning and unknown class recognition. In this paper, we propose a novel learning approach based on domain expansion and boundary growth to expand the scarce source samples and enlarge the boundaries across the known classes that indirectly broaden the boundary between the known and unknown classes. Specifically, we achieve domain expansion by employing both background suppression and style augmentation on the source data to synthesize new samples. Then we force the model to distill consistent knowledge from the synthesized samples so that the model can learn domain-invariant information. Furthermore, we realize boundary growth across classes by using edge maps as an additional modality of samples when training multi-binary classifiers. In this way, it enlarges the boundary between the inliers and outliers, and consequently improves the unknown class recognition during open-set generalization. Extensive experiments show that our approach can achieve significant improvements and reach state-of-the-art performance on several cross-domain image classification datasets.

* TMM 2024

Via

Access Paper or Ask Questions

IDEATOR: Jailbreaking VLMs Using VLMs

Oct 29, 2024

Ruofan Wang, Bo Wang, Xingjun Ma, Yu-Gang Jiang

Figure 1 for IDEATOR: Jailbreaking VLMs Using VLMs

Figure 2 for IDEATOR: Jailbreaking VLMs Using VLMs

Figure 3 for IDEATOR: Jailbreaking VLMs Using VLMs

Figure 4 for IDEATOR: Jailbreaking VLMs Using VLMs

Abstract:As large Vision-Language Models (VLMs) continue to gain prominence, ensuring their safety deployment in real-world applications has become a critical concern. Recently, significant research efforts have focused on evaluating the robustness of VLMs against jailbreak attacks. Due to challenges in obtaining multi-modal data, current studies often assess VLM robustness by generating adversarial or query-relevant images based on harmful text datasets. However, the jailbreak images generated this way exhibit certain limitations. Adversarial images require white-box access to the target VLM and are relatively easy to defend against, while query-relevant images must be linked to the target harmful content, limiting their diversity and effectiveness. In this paper, we propose a novel jailbreak method named IDEATOR, which autonomously generates malicious image-text pairs for black-box jailbreak attacks. IDEATOR is a VLM-based approach inspired by our conjecture that a VLM itself might be a powerful red team model for generating jailbreak prompts. Specifically, IDEATOR employs a VLM to generate jailbreak texts while leveraging a state-of-the-art diffusion model to create corresponding jailbreak images. Extensive experiments demonstrate the high effectiveness and transferability of IDEATOR. It successfully jailbreaks MiniGPT-4 with a 94% success rate and transfers seamlessly to LLaVA and InstructBLIP, achieving high success rates of 82% and 88%, respectively. IDEATOR uncovers previously unrecognized vulnerabilities in VLMs, calling for advanced safety mechanisms.

Via

Access Paper or Ask Questions

BlueSuffix: Reinforced Blue Teaming for Vision-Language Models Against Jailbreak Attacks

Oct 28, 2024

Yunhan Zhao, Xiang Zheng, Lin Luo, Yige Li, Xingjun Ma, Yu-Gang Jiang

Figure 1 for BlueSuffix: Reinforced Blue Teaming for Vision-Language Models Against Jailbreak Attacks

Figure 2 for BlueSuffix: Reinforced Blue Teaming for Vision-Language Models Against Jailbreak Attacks

Figure 3 for BlueSuffix: Reinforced Blue Teaming for Vision-Language Models Against Jailbreak Attacks

Figure 4 for BlueSuffix: Reinforced Blue Teaming for Vision-Language Models Against Jailbreak Attacks

Abstract:Despite their superb multimodal capabilities, Vision-Language Models (VLMs) have been shown to be vulnerable to jailbreak attacks, which are inference-time attacks that induce the model to output harmful responses with tricky prompts. It is thus essential to defend VLMs against potential jailbreaks for their trustworthy deployment in real-world applications. In this work, we focus on black-box defense for VLMs against jailbreak attacks. Existing black-box defense methods are either unimodal or bimodal. Unimodal methods enhance either the vision or language module of the VLM, while bimodal methods robustify the model through text-image representation realignment. However, these methods suffer from two limitations: 1) they fail to fully exploit the cross-modal information, or 2) they degrade the model performance on benign inputs. To address these limitations, we propose a novel blue-team method BlueSuffix that defends the black-box target VLM against jailbreak attacks without compromising its performance. BlueSuffix includes three key components: 1) a visual purifier against jailbreak images, 2) a textual purifier against jailbreak texts, and 3) a blue-team suffix generator fine-tuned via reinforcement learning for enhancing cross-modal robustness. We empirically show on three VLMs (LLaVA, MiniGPT-4, and Gemini) and two safety benchmarks (MM-SafetyBench and RedTeam-2K) that BlueSuffix outperforms the baseline defenses by a significant margin. Our BlueSuffix opens up a promising direction for defending VLMs against jailbreak attacks.

Via

Access Paper or Ask Questions

Llama Scope: Extracting Millions of Features from Llama-3.1-8B with Sparse Autoencoders

Oct 27, 2024

Zhengfu He, Wentao Shu, Xuyang Ge, Lingjie Chen, Junxuan Wang, Yunhua Zhou, Frances Liu, Qipeng Guo, Xuanjing Huang, Zuxuan Wu(+2 more)

Figure 1 for Llama Scope: Extracting Millions of Features from Llama-3.1-8B with Sparse Autoencoders

Figure 2 for Llama Scope: Extracting Millions of Features from Llama-3.1-8B with Sparse Autoencoders

Figure 3 for Llama Scope: Extracting Millions of Features from Llama-3.1-8B with Sparse Autoencoders

Figure 4 for Llama Scope: Extracting Millions of Features from Llama-3.1-8B with Sparse Autoencoders

Abstract:Sparse Autoencoders (SAEs) have emerged as a powerful unsupervised method for extracting sparse representations from language models, yet scalable training remains a significant challenge. We introduce a suite of 256 SAEs, trained on each layer and sublayer of the Llama-3.1-8B-Base model, with 32K and 128K features. Modifications to a state-of-the-art SAE variant, Top-K SAEs, are evaluated across multiple dimensions. In particular, we assess the generalizability of SAEs trained on base models to longer contexts and fine-tuned models. Additionally, we analyze the geometry of learned SAE latents, confirming that \emph{feature splitting} enables the discovery of new features. The Llama Scope SAE checkpoints are publicly available at~\url{https://huggingface.co/fnlp/Llama-Scope}, alongside our scalable training, interpretation, and visualization tools at \url{https://github.com/OpenMOSS/Language-Model-SAEs}. These contributions aim to advance the open-source Sparse Autoencoder ecosystem and support mechanistic interpretability research by reducing the need for redundant SAE training.

* 22pages, 12 figures

Via

Access Paper or Ask Questions

Expose Before You Defend: Unifying and Enhancing Backdoor Defenses via Exposed Models

Oct 25, 2024

Yige Li, Hanxun Huang, Jiaming Zhang, Xingjun Ma, Yu-Gang Jiang

Figure 1 for Expose Before You Defend: Unifying and Enhancing Backdoor Defenses via Exposed Models

Figure 2 for Expose Before You Defend: Unifying and Enhancing Backdoor Defenses via Exposed Models

Figure 3 for Expose Before You Defend: Unifying and Enhancing Backdoor Defenses via Exposed Models

Figure 4 for Expose Before You Defend: Unifying and Enhancing Backdoor Defenses via Exposed Models

Abstract:Backdoor attacks covertly implant triggers into deep neural networks (DNNs) by poisoning a small portion of the training data with pre-designed backdoor triggers. This vulnerability is exacerbated in the era of large models, where extensive (pre-)training on web-crawled datasets is susceptible to compromise. In this paper, we introduce a novel two-step defense framework named Expose Before You Defend (EBYD). EBYD unifies existing backdoor defense methods into a comprehensive defense system with enhanced performance. Specifically, EBYD first exposes the backdoor functionality in the backdoored model through a model preprocessing step called backdoor exposure, and then applies detection and removal methods to the exposed model to identify and eliminate the backdoor features. In the first step of backdoor exposure, we propose a novel technique called Clean Unlearning (CUL), which proactively unlearns clean features from the backdoored model to reveal the hidden backdoor features. We also explore various model editing/modification techniques for backdoor exposure, including fine-tuning, model sparsification, and weight perturbation. Using EBYD, we conduct extensive experiments on 10 image attacks and 6 text attacks across 2 vision datasets (CIFAR-10 and an ImageNet subset) and 4 language datasets (SST-2, IMDB, Twitter, and AG's News). The results demonstrate the importance of backdoor exposure for backdoor defense, showing that the exposed models can significantly benefit a range of downstream defense tasks, including backdoor label detection, backdoor trigger recovery, backdoor model detection, and backdoor removal. We hope our work could inspire more research in developing advanced defense frameworks with exposed models. Our code is available at: https://github.com/bboylyg/Expose-Before-You-Defend.

* 19 pages

Via

Access Paper or Ask Questions

UnSeg: One Universal Unlearnable Example Generator is Enough against All Image Segmentation

Oct 13, 2024

Ye Sun, Hao Zhang, Tiehua Zhang, Xingjun Ma, Yu-Gang Jiang

Figure 1 for UnSeg: One Universal Unlearnable Example Generator is Enough against All Image Segmentation

Figure 2 for UnSeg: One Universal Unlearnable Example Generator is Enough against All Image Segmentation

Figure 3 for UnSeg: One Universal Unlearnable Example Generator is Enough against All Image Segmentation

Figure 4 for UnSeg: One Universal Unlearnable Example Generator is Enough against All Image Segmentation

Abstract:Image segmentation is a crucial vision task that groups pixels within an image into semantically meaningful segments, which is pivotal in obtaining a fine-grained understanding of real-world scenes. However, an increasing privacy concern exists regarding training large-scale image segmentation models on unauthorized private data. In this work, we exploit the concept of unlearnable examples to make images unusable to model training by generating and adding unlearnable noise into the original images. Particularly, we propose a novel Unlearnable Segmentation (UnSeg) framework to train a universal unlearnable noise generator that is capable of transforming any downstream images into their unlearnable version. The unlearnable noise generator is finetuned from the Segment Anything Model (SAM) via bilevel optimization on an interactive segmentation dataset towards minimizing the training error of a surrogate model that shares the same architecture with SAM but is trained from scratch. We empirically verify the effectiveness of UnSeg across 6 mainstream image segmentation tasks, 10 widely used datasets, and 7 different network architectures, and show that the unlearnable images can reduce the segmentation performance by a large margin. Our work provides useful insights into how to leverage foundation models in a data-efficient and computationally affordable manner to protect images against image segmentation models.

* NeurIPS 2024

Via

Access Paper or Ask Questions

EAGLE: Towards Efficient Arbitrary Referring Visual Prompts Comprehension for Multimodal Large Language Models

Sep 26, 2024

Jiacheng Zhang, Yang Jiao, Shaoxiang Chen, Jingjing Chen, Yu-Gang Jiang

Figure 1 for EAGLE: Towards Efficient Arbitrary Referring Visual Prompts Comprehension for Multimodal Large Language Models

Figure 2 for EAGLE: Towards Efficient Arbitrary Referring Visual Prompts Comprehension for Multimodal Large Language Models

Figure 3 for EAGLE: Towards Efficient Arbitrary Referring Visual Prompts Comprehension for Multimodal Large Language Models

Figure 4 for EAGLE: Towards Efficient Arbitrary Referring Visual Prompts Comprehension for Multimodal Large Language Models

Abstract:Recently, Multimodal Large Language Models (MLLMs) have sparked great research interests owing to their exceptional content-reasoning and instruction-following capabilities. To effectively instruct an MLLM, in addition to conventional language expressions, the practice of referring to objects by painting with brushes on images has emerged as a prevalent tool (referred to as "referring visual prompts") due to its efficacy in aligning the user's intention with specific image regions. To accommodate the most common referring visual prompts, namely points, boxes, and masks, existing approaches initially utilize specialized feature encoding modules to capture the semantics of the highlighted areas indicated by these prompts. Subsequently, these encoded region features are adapted to MLLMs through fine-tuning on a meticulously curated multimodal instruction dataset. However, such designs suffer from redundancy in architecture. Moreover, they face challenges in effectively generalizing when encountering a diverse range of arbitrary referring visual prompts in real-life scenarios. To address the above issues, we propose EAGLE, a novel MLLM that empowers comprehension of arbitrary referring visual prompts with less training efforts than existing approaches. Specifically, our EAGLE maintains the innate format of the referring visual prompts as colored patches rendered on the given image for conducting the instruction tuning. Our approach embeds referring visual prompts as spatial concepts conveying specific spatial areas comprehensible to the MLLM, with the semantic comprehension of these regions originating from the MLLM itself. Besides, we also propose a Geometry-Agnostic Learning paradigm (GAL) to further disentangle the MLLM's region-level comprehension with the specific formats of referring visual prompts. Extensive experiments are conducted to prove the effectiveness of our proposed method.

Via

Access Paper or Ask Questions

EventHallusion: Diagnosing Event Hallucinations in Video LLMs

Sep 25, 2024

Jiacheng Zhang, Yang Jiao, Shaoxiang Chen, Jingjing Chen, Yu-Gang Jiang

Figure 1 for EventHallusion: Diagnosing Event Hallucinations in Video LLMs

Figure 2 for EventHallusion: Diagnosing Event Hallucinations in Video LLMs

Figure 3 for EventHallusion: Diagnosing Event Hallucinations in Video LLMs

Figure 4 for EventHallusion: Diagnosing Event Hallucinations in Video LLMs

Abstract:Recently, Multimodal Large Language Models (MLLMs) have made significant progress in the video comprehension field. Despite remarkable content reasoning and instruction following capabilities they demonstrated, the hallucination problem of these VideoLLMs is less explored compared with its counterpart in the image domain. To mitigate this gap, we first propose EventHallusion, a novel benchmark that focuses on assessing the VideoLMMs' hallucination phenomenon on video event comprehension. Based on the observation that existing VideoLLMs are entangled with the priors stemming from their foundation models, our EventHallusion is curated by meticulously collecting videos and annotating questions to intentionally mislead the VideoLLMs into interpreting events based on these priors rather than accurately understanding the video content. On the other hand, we also propose a simple yet effective method, called Temporal Contrastive Decoding (TCD), to tackle the hallucination problems of VideoLLMs. The proposed TCD suppresses the model's preference toward their priors by comparing the original video with a constructed counterpart, whose temporal cues are disrupted, during the autoregressive decoding stage. Through comprehensive evaluation of eight open-source and two closed-source VideoLLMs on the proposed EventHallusion benchmark, we find that the open-source models suffer significantly from hallucination problems, whereas the closed-source models perform markedly better. By further equipping open-sourced VideoLLMs with the proposed TCD approach, evident performance improvements are achieved across most metrics in the EventHallusion benchmark. Our codes and benchmark data are available at https://github.com/Stevetich/EventHallusion.

Via

Access Paper or Ask Questions

DreamMesh: Jointly Manipulating and Texturing Triangle Meshes for Text-to-3D Generation

Sep 11, 2024

Haibo Yang, Yang Chen, Yingwei Pan, Ting Yao, Zhineng Chen, Zuxuan Wu, Yu-Gang Jiang, Tao Mei

Figure 1 for DreamMesh: Jointly Manipulating and Texturing Triangle Meshes for Text-to-3D Generation

Figure 2 for DreamMesh: Jointly Manipulating and Texturing Triangle Meshes for Text-to-3D Generation

Figure 3 for DreamMesh: Jointly Manipulating and Texturing Triangle Meshes for Text-to-3D Generation

Figure 4 for DreamMesh: Jointly Manipulating and Texturing Triangle Meshes for Text-to-3D Generation

Abstract:Learning radiance fields (NeRF) with powerful 2D diffusion models has garnered popularity for text-to-3D generation. Nevertheless, the implicit 3D representations of NeRF lack explicit modeling of meshes and textures over surfaces, and such surface-undefined way may suffer from the issues, e.g., noisy surfaces with ambiguous texture details or cross-view inconsistency. To alleviate this, we present DreamMesh, a novel text-to-3D architecture that pivots on well-defined surfaces (triangle meshes) to generate high-fidelity explicit 3D model. Technically, DreamMesh capitalizes on a distinctive coarse-to-fine scheme. In the coarse stage, the mesh is first deformed by text-guided Jacobians and then DreamMesh textures the mesh with an interlaced use of 2D diffusion models in a tuning free manner from multiple viewpoints. In the fine stage, DreamMesh jointly manipulates the mesh and refines the texture map, leading to high-quality triangle meshes with high-fidelity textured materials. Extensive experiments demonstrate that DreamMesh significantly outperforms state-of-the-art text-to-3D methods in faithfully generating 3D content with richer textual details and enhanced geometry. Our project page is available at https://dreammesh.github.io.

* ECCV 2024. Project page is available at \url{https://dreammesh.github.io}

Via

Access Paper or Ask Questions

FreeEnhance: Tuning-Free Image Enhancement via Content-Consistent Noising-and-Denoising Process

Sep 11, 2024

Yang Luo, Yiheng Zhang, Zhaofan Qiu, Ting Yao, Zhineng Chen, Yu-Gang Jiang, Tao Mei

Figure 1 for FreeEnhance: Tuning-Free Image Enhancement via Content-Consistent Noising-and-Denoising Process

Figure 2 for FreeEnhance: Tuning-Free Image Enhancement via Content-Consistent Noising-and-Denoising Process

Figure 3 for FreeEnhance: Tuning-Free Image Enhancement via Content-Consistent Noising-and-Denoising Process

Figure 4 for FreeEnhance: Tuning-Free Image Enhancement via Content-Consistent Noising-and-Denoising Process

Abstract:The emergence of text-to-image generation models has led to the recognition that image enhancement, performed as post-processing, would significantly improve the visual quality of the generated images. Exploring diffusion models to enhance the generated images nevertheless is not trivial and necessitates to delicately enrich plentiful details while preserving the visual appearance of key content in the original image. In this paper, we propose a novel framework, namely FreeEnhance, for content-consistent image enhancement using the off-the-shelf image diffusion models. Technically, FreeEnhance is a two-stage process that firstly adds random noise to the input image and then capitalizes on a pre-trained image diffusion model (i.e., Latent Diffusion Models) to denoise and enhance the image details. In the noising stage, FreeEnhance is devised to add lighter noise to the region with higher frequency to preserve the high-frequent patterns (e.g., edge, corner) in the original image. In the denoising stage, we present three target properties as constraints to regularize the predicted noise, enhancing images with high acutance and high visual quality. Extensive experiments conducted on the HPDv2 dataset demonstrate that our FreeEnhance outperforms the state-of-the-art image enhancement models in terms of quantitative metrics and human preference. More remarkably, FreeEnhance also shows higher human preference compared to the commercial image enhancement solution of Magnific AI.

* ACM Multimedia 2024

Via

Access Paper or Ask Questions