Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Chen Wu

Prompting Forgetting: Unlearning in GANs via Textual Guidance

Apr 01, 2025

Piyush Nagasubramaniam, Neeraj Karamchandani, Chen Wu, Sencun Zhu

Abstract:State-of-the-art generative models exhibit powerful image-generation capabilities, introducing various ethical and legal challenges to service providers hosting these models. Consequently, Content Removal Techniques (CRTs) have emerged as a growing area of research to control outputs without full-scale retraining. Recent work has explored the use of Machine Unlearning in generative models to address content removal. However, the focus of such research has been on diffusion models, and unlearning in Generative Adversarial Networks (GANs) has remained largely unexplored. We address this gap by proposing Text-to-Unlearn, a novel framework that selectively unlearns concepts from pre-trained GANs using only text prompts, enabling feature unlearning, identity unlearning, and fine-grained tasks like expression and multi-attribute removal in models trained on human faces. Leveraging natural language descriptions, our approach guides the unlearning process without requiring additional datasets or supervised fine-tuning, offering a scalable and efficient solution. To evaluate its effectiveness, we introduce an automatic unlearning assessment method adapted from state-of-the-art image-text alignment metrics, providing a comprehensive analysis of the unlearning methodology. To our knowledge, Text-to-Unlearn is the first cross-modal unlearning framework for GANs, representing a flexible and efficient advancement in managing generative model behavior.

Via

Access Paper or Ask Questions

LLMs Know What to Drop: Self-Attention Guided KV Cache Eviction for Efficient Long-Context Inference

Mar 11, 2025

Guangtao Wang, Shubhangi Upasani, Chen Wu, Darshan Gandhi, Jonathan Li, Changran Hu, Bo Li, Urmish Thakker

Figure 1 for LLMs Know What to Drop: Self-Attention Guided KV Cache Eviction for Efficient Long-Context Inference

Figure 2 for LLMs Know What to Drop: Self-Attention Guided KV Cache Eviction for Efficient Long-Context Inference

Figure 3 for LLMs Know What to Drop: Self-Attention Guided KV Cache Eviction for Efficient Long-Context Inference

Figure 4 for LLMs Know What to Drop: Self-Attention Guided KV Cache Eviction for Efficient Long-Context Inference

Abstract:Efficient long-context inference is critical as large language models (LLMs) adopt context windows of ranging from 128K to 1M tokens. However, the growing key-value (KV) cache and the high computational complexity of attention create significant bottlenecks in memory usage and latency. In this paper, we find that attention in diverse long-context tasks exhibits sparsity, and LLMs implicitly "know" which tokens can be dropped or evicted at the head level after the pre-filling stage. Based on this insight, we propose Self-Attention Guided Eviction~(SAGE-KV), a simple and effective KV eviction cache method for long-context inference. After prefilling, our method performs a one-time top-k selection at both the token and head levels to compress the KV cache, enabling efficient inference with the reduced cache. Evaluations on LongBench and three long-context LLMs (Llama3.1-8B-Instruct-128k, Llama3-8B-Prolong-512k-Instruct, and Qwen2.5-7B-Instruct-128k) show that SAGE-KV maintains accuracy comparable to full attention while significantly improving efficiency. Specifically, SAGE-KV achieves 4x higher memory efficiency with improved accuracy over the static KV cache selection method StreamLLM, and 2x higher memory efficiency with better accuracy than the dynamic KV cache selection method Quest.

Via

Access Paper or Ask Questions

Plug-and-Play DISep: Separating Dense Instances for Scene-to-Pixel Weakly-Supervised Change Detection in High-Resolution Remote Sensing Images

Jan 10, 2025

Zhenghui Zhao, Chen Wu, Lixiang Ru, Di Wang, Hongruixuan Chen, Cuiqun Chen

Abstract:Existing Weakly-Supervised Change Detection (WSCD) methods often encounter the problem of "instance lumping" under scene-level supervision, particularly in scenarios with a dense distribution of changed instances (i.e., changed objects). In these scenarios, unchanged pixels between changed instances are also mistakenly identified as changed, causing multiple changes to be mistakenly viewed as one. In practical applications, this issue prevents the accurate quantification of the number of changes. To address this issue, we propose a Dense Instance Separation (DISep) method as a plug-and-play solution, refining pixel features from a unified instance perspective under scene-level supervision. Specifically, our DISep comprises a three-step iterative training process: 1) Instance Localization: We locate instance candidate regions for changed pixels using high-pass class activation maps. 2) Instance Retrieval: We identify and group these changed pixels into different instance IDs through connectivity searching. Then, based on the assigned instance IDs, we extract corresponding pixel-level features on a per-instance basis. 3) Instance Separation: We introduce a separation loss to enforce intra-instance pixel consistency in the embedding space, thereby ensuring separable instance feature representations. The proposed DISep adds only minimal training cost and no inference cost. It can be seamlessly integrated to enhance existing WSCD methods. We achieve state-of-the-art performance by enhancing {three Transformer-based and four ConvNet-based methods} on the LEVIR-CD, WHU-CD, DSIFN-CD, SYSU-CD, and CDD datasets. Additionally, our DISep can be used to improve fully-supervised change detection methods. Code is available at https://github.com/zhenghuizhao/Plug-and-Play-DISep-for-Change-Detection.

* Accepted by ISPRS Journal of Photogrammetry and Remote Sensing

Via

Access Paper or Ask Questions

Efficient Ternary Weight Embedding Model: Bridging Scalability and Performance

Nov 23, 2024

Jiayi Chen, Chen Wu, Shaoqun Zhang, Nan Li, Liangjie Zhang, Qi Zhang

Figure 1 for Efficient Ternary Weight Embedding Model: Bridging Scalability and Performance

Figure 2 for Efficient Ternary Weight Embedding Model: Bridging Scalability and Performance

Figure 3 for Efficient Ternary Weight Embedding Model: Bridging Scalability and Performance

Figure 4 for Efficient Ternary Weight Embedding Model: Bridging Scalability and Performance

Abstract:Embedding models have become essential tools in both natural language processing and computer vision, enabling efficient semantic search, recommendation, clustering, and more. However, the high memory and computational demands of full-precision embeddings pose challenges for deployment in resource-constrained environments, such as real-time recommendation systems. In this work, we propose a novel finetuning framework to ternary-weight embedding models, which reduces memory and computational overhead while maintaining high performance. To apply ternarization to pre-trained embedding models, we introduce self-taught knowledge distillation to finalize the ternary-weights of the linear layers. With extensive experiments on public text and vision datasets, we demonstrated that without sacrificing effectiveness, the ternarized model consumes low memory usage and has low latency in the inference stage with great efficiency. In practical implementations, embedding models are typically integrated with Approximate Nearest Neighbor (ANN) search. Our experiments combining ternary embedding with ANN search yielded impressive improvement in both accuracy and computational efficiency. The repository is available at here.

* Technical Report

Via

Access Paper or Ask Questions

TSFormer: A Robust Framework for Efficient UHD Image Restoration

Nov 19, 2024

Xin Su, Chen Wu, Zhuoran Zheng

Figure 1 for TSFormer: A Robust Framework for Efficient UHD Image Restoration

Figure 2 for TSFormer: A Robust Framework for Efficient UHD Image Restoration

Figure 3 for TSFormer: A Robust Framework for Efficient UHD Image Restoration

Figure 4 for TSFormer: A Robust Framework for Efficient UHD Image Restoration

Abstract:Ultra-high-definition (UHD) image restoration is vital for applications demanding exceptional visual fidelity, yet existing methods often face a trade-off between restoration quality and efficiency, limiting their practical deployment. In this paper, we propose TSFormer, an all-in-one framework that integrates \textbf{T}rusted learning with \textbf{S}parsification to boost both generalization capability and computational efficiency in UHD image restoration. The key is that only a small amount of token movement is allowed within the model. To efficiently filter tokens, we use Min-$p$ with random matrix theory to quantify the uncertainty of tokens, thereby improving the robustness of the model. Our model can run a 4K image in real time (40fps) with 3.38 M parameters. Extensive experiments demonstrate that TSFormer achieves state-of-the-art restoration quality while enhancing generalization and reducing computational demands. In addition, our token filtering method can be applied to other image restoration models to effectively accelerate inference and maintain performance.

Via

Access Paper or Ask Questions

Dropout the High-rate Downsampling: A Novel Design Paradigm for UHD Image Restoration

Nov 10, 2024

Chen Wu, Ling Wang, Long Peng, Dianjie Lu, Zhuoran Zheng

Figure 1 for Dropout the High-rate Downsampling: A Novel Design Paradigm for UHD Image Restoration

Figure 2 for Dropout the High-rate Downsampling: A Novel Design Paradigm for UHD Image Restoration

Figure 3 for Dropout the High-rate Downsampling: A Novel Design Paradigm for UHD Image Restoration

Figure 4 for Dropout the High-rate Downsampling: A Novel Design Paradigm for UHD Image Restoration

Abstract:With the popularization of high-end mobile devices, Ultra-high-definition (UHD) images have become ubiquitous in our lives. The restoration of UHD images is a highly challenging problem due to the exaggerated pixel count, which often leads to memory overflow during processing. Existing methods either downsample UHD images at a high rate before processing or split them into multiple patches for separate processing. However, high-rate downsampling leads to significant information loss, while patch-based approaches inevitably introduce boundary artifacts. In this paper, we propose a novel design paradigm to solve the UHD image restoration problem, called D2Net. D2Net enables direct full-resolution inference on UHD images without the need for high-rate downsampling or dividing the images into several patches. Specifically, we ingeniously utilize the characteristics of the frequency domain to establish long-range dependencies of features. Taking into account the richer local patterns in UHD images, we also design a multi-scale convolutional group to capture local features. Additionally, during the decoding stage, we dynamically incorporate features from the encoding stage to reduce the flow of irrelevant information. Extensive experiments on three UHD image restoration tasks, including low-light image enhancement, image dehazing, and image deblurring, show that our model achieves better quantitative and qualitative results than state-of-the-art methods.

* WACV2025

Via

Access Paper or Ask Questions

HorGait: Advancing Gait Recognition with Efficient High-Order Spatial Interactions in LiDAR Point Clouds

Oct 11, 2024

Jiaxing Hao, Yanxi Wang, Zhigang Chang, Hongmin Gao, Zihao Cheng, Chen Wu, Xin Zhao, Peiye Fang, Rachmat Muwardi

Abstract:Gait recognition is a remote biometric technology that utilizes the dynamic characteristics of human movement to identify individuals even under various extreme lighting conditions. Due to the limitation in spatial perception capability inherent in 2D gait representations, LiDAR can directly capture 3D gait features and represent them as point clouds, reducing environmental and lighting interference in recognition while significantly advancing privacy protection. For complex 3D representations, shallow networks fail to achieve accurate recognition, making vision Transformers the foremost prevalent method. However, the prevalence of dumb patches has limited the widespread use of Transformer architecture in gait recognition. This paper proposes a method named HorGait, which utilizes a hybrid model with a Transformer architecture for gait recognition on the planar projection of 3D point clouds from LiDAR. Specifically, it employs a hybrid model structure called LHM Block to achieve input adaptation, long-range, and high-order spatial interaction of the Transformer architecture. Additionally, it uses large convolutional kernel CNNs to segment the input representation, replacing attention windows to reduce dumb patches. We conducted extensive experiments, and the results show that HorGait achieves state-of-the-art performance among Transformer architecture methods on the SUSTech1K dataset, verifying that the hybrid model can complete the full Transformer process and perform better in point cloud planar projection. The outstanding performance of HorGait offers new insights for the future application of the Transformer architecture in gait recognition.

Via

Access Paper or Ask Questions

BrainDreamer: Reasoning-Coherent and Controllable Image Generation from EEG Brain Signals via Language Guidance

Sep 21, 2024

Ling Wang, Chen Wu, Lin Wang

Figure 1 for BrainDreamer: Reasoning-Coherent and Controllable Image Generation from EEG Brain Signals via Language Guidance

Figure 2 for BrainDreamer: Reasoning-Coherent and Controllable Image Generation from EEG Brain Signals via Language Guidance

Figure 3 for BrainDreamer: Reasoning-Coherent and Controllable Image Generation from EEG Brain Signals via Language Guidance

Figure 4 for BrainDreamer: Reasoning-Coherent and Controllable Image Generation from EEG Brain Signals via Language Guidance

Abstract:Can we directly visualize what we imagine in our brain together with what we describe? The inherent nature of human perception reveals that, when we think, our body can combine language description and build a vivid picture in our brain. Intuitively, generative models should also hold such versatility. In this paper, we introduce BrainDreamer, a novel end-to-end language-guided generative framework that can mimic human reasoning and generate high-quality images from electroencephalogram (EEG) brain signals. Our method is superior in its capacity to eliminate the noise introduced by non-invasive EEG data acquisition and meanwhile achieve a more precise mapping between the EEG and image modality, thus leading to significantly better-generated images. Specifically, BrainDreamer consists of two key learning stages: 1) modality alignment and 2) image generation. In the alignment stage, we propose a novel mask-based triple contrastive learning strategy to effectively align EEG, text, and image embeddings to learn a unified representation. In the generation stage, we inject the EEG embeddings into the pre-trained Stable Diffusion model by designing a learnable EEG adapter to generate high-quality reasoning-coherent images. Moreover, BrainDreamer can accept textual descriptions (e.g., color, position, etc.) to achieve controllable image generation. Extensive experiments show that our method significantly outperforms prior arts in terms of generating quality and quantitative performance.

Via

Access Paper or Ask Questions

SpheriGait: Enriching Spatial Representation via Spherical Projection for LiDAR-based Gait Recognition

Sep 18, 2024

Yanxi Wang, Zhigang Chang, Chen Wu, Zihao Cheng, Hongmin Gao

Abstract:Gait recognition is a rapidly progressing technique for the remote identification of individuals. Prior research predominantly employing 2D sensors to gather gait data has achieved notable advancements; nonetheless, they have unavoidably neglected the influence of 3D dynamic characteristics on recognition. Gait recognition utilizing LiDAR 3D point clouds not only directly captures 3D spatial features but also diminishes the impact of lighting conditions while ensuring privacy protection.The essence of the problem lies in how to effectively extract discriminative 3D dynamic representation from point clouds.In this paper, we proposes a method named SpheriGait for extracting and enhancing dynamic features from point clouds for Lidar-based gait recognition. Specifically, it substitutes the conventional point cloud plane projection method with spherical projection to augment the perception of dynamic feature.Additionally, a network block named DAM-L is proposed to extract gait cues from the projected point cloud data. We conducted extensive experiments and the results demonstrated the SpheriGait achieved state-of-the-art performance on the SUSTech1K dataset, and verified that the spherical projection method can serve as a universal data preprocessing technique to enhance the performance of other LiDAR-based gait recognition methods, exhibiting exceptional flexibility and practicality.

Via

Access Paper or Ask Questions

Multi-Scale Representation Learning for Image Restoration with State-Space Model

Aug 19, 2024

Yuhong He, Long Peng, Qiaosi Yi, Chen Wu, Lu Wang

Figure 1 for Multi-Scale Representation Learning for Image Restoration with State-Space Model

Figure 2 for Multi-Scale Representation Learning for Image Restoration with State-Space Model

Figure 3 for Multi-Scale Representation Learning for Image Restoration with State-Space Model

Figure 4 for Multi-Scale Representation Learning for Image Restoration with State-Space Model

Abstract:Image restoration endeavors to reconstruct a high-quality, detail-rich image from a degraded counterpart, which is a pivotal process in photography and various computer vision systems. In real-world scenarios, different types of degradation can cause the loss of image details at various scales and degrade image contrast. Existing methods predominantly rely on CNN and Transformer to capture multi-scale representations. However, these methods are often limited by the high computational complexity of Transformers and the constrained receptive field of CNN, which hinder them from achieving superior performance and efficiency in image restoration. To address these challenges, we propose a novel Multi-Scale State-Space Model-based (MS-Mamba) for efficient image restoration that enhances the capacity for multi-scale representation learning through our proposed global and regional SSM modules. Additionally, an Adaptive Gradient Block (AGB) and a Residual Fourier Block (RFB) are proposed to improve the network's detail extraction capabilities by capturing gradients in various directions and facilitating learning details in the frequency domain. Extensive experiments on nine public benchmarks across four classic image restoration tasks, image deraining, dehazing, denoising, and low-light enhancement, demonstrate that our proposed method achieves new state-of-the-art performance while maintaining low computational complexity. The source code will be publicly available.

Via

Access Paper or Ask Questions