Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Shuai Zhang

Unlearning through Knowledge Overwriting: Reversible Federated Unlearning via Selective Sparse Adapter

Feb 28, 2025

Zhengyi Zhong, Weidong Bao, Ji Wang, Shuai Zhang, Jingxuan Zhou, Lingjuan Lyu, Wei Yang Bryan Lim

Figure 1 for Unlearning through Knowledge Overwriting: Reversible Federated Unlearning via Selective Sparse Adapter

Figure 2 for Unlearning through Knowledge Overwriting: Reversible Federated Unlearning via Selective Sparse Adapter

Figure 3 for Unlearning through Knowledge Overwriting: Reversible Federated Unlearning via Selective Sparse Adapter

Figure 4 for Unlearning through Knowledge Overwriting: Reversible Federated Unlearning via Selective Sparse Adapter

Abstract:Federated Learning is a promising paradigm for privacy-preserving collaborative model training. In practice, it is essential not only to continuously train the model to acquire new knowledge but also to guarantee old knowledge the right to be forgotten (i.e., federated unlearning), especially for privacy-sensitive information or harmful knowledge. However, current federated unlearning methods face several challenges, including indiscriminate unlearning of cross-client knowledge, irreversibility of unlearning, and significant unlearning costs. To this end, we propose a method named FUSED, which first identifies critical layers by analyzing each layer's sensitivity to knowledge and constructs sparse unlearning adapters for sensitive ones. Then, the adapters are trained without altering the original parameters, overwriting the unlearning knowledge with the remaining knowledge. This knowledge overwriting process enables FUSED to mitigate the effects of indiscriminate unlearning. Moreover, the introduction of independent adapters makes unlearning reversible and significantly reduces the unlearning costs. Finally, extensive experiments on three datasets across various unlearning scenarios demonstrate that FUSED's effectiveness is comparable to Retraining, surpassing all other baselines while greatly reducing unlearning costs.

* Accepted by CVPR2025

Via

Access Paper or Ask Questions

Boosting Multimodal Reasoning with MCTS-Automated Structured Thinking

Feb 04, 2025

Jinyang Wu, Mingkuan Feng, Shuai Zhang, Ruihan Jin, Feihu Che, Zengqi Wen, Jianhua Tao

Figure 1 for Boosting Multimodal Reasoning with MCTS-Automated Structured Thinking

Figure 2 for Boosting Multimodal Reasoning with MCTS-Automated Structured Thinking

Figure 3 for Boosting Multimodal Reasoning with MCTS-Automated Structured Thinking

Figure 4 for Boosting Multimodal Reasoning with MCTS-Automated Structured Thinking

Abstract:Multimodal large language models (MLLMs) exhibit impressive capabilities but still face challenges in complex visual reasoning. While recent efforts attempt to enhance MLLMs' reasoning by incorporating OpenAI o1-like structured thinking through explicit search structures or teacher-guided distillation, they often struggle to balance performance and efficiency. A critical limitation is their heavy reliance on extensive data and search spaces, resulting in low-efficiency implicit insight extraction and data utilization. To address this, we propose AStar, an Automated Structured thinking paradigm for multimodal reasoning via Monte Carlo Tree Search (MCTS). AStar automatically derives high-level cognitive reasoning patterns from limited data using MCTS-powered hierarchical structures. Building on these explicit patterns, we design a unified reasoning framework that seamlessly integrates models' internal reasoning capabilities and external reasoning guidelines, enabling efficient inference with minimal tree iterations. This novel paradigm strikes a compelling balance between performance and efficiency. Extensive experiments demonstrate AStar's effectiveness, achieving superior accuracy (54.0$\%$) on the MathVerse benchmark with a 7B backbone, surpassing GPT-4o (50.2$\%$) while maintaining substantial data and computational efficiency.

Via

Access Paper or Ask Questions

DReSS: Data-driven Regularized Structured Streamlining for Large Language Models

Jan 29, 2025

Mingkuan Feng, Jinyang Wu, Shuai Zhang, Pengpeng Shao, Ruihan Jin, Zhengqi Wen, Jianhua Tao, Feihu Che

Figure 1 for DReSS: Data-driven Regularized Structured Streamlining for Large Language Models

Figure 2 for DReSS: Data-driven Regularized Structured Streamlining for Large Language Models

Figure 3 for DReSS: Data-driven Regularized Structured Streamlining for Large Language Models

Figure 4 for DReSS: Data-driven Regularized Structured Streamlining for Large Language Models

Abstract:Large language models (LLMs) have achieved significant progress across various domains, but their increasing scale results in high computational and memory costs. Recent studies have revealed that LLMs exhibit sparsity, providing the potential to reduce model size through pruning techniques. However, existing pruning methods typically follow a prune-then-finetune paradigm. Since the pruned components still contain valuable information, their direct removal often leads to irreversible performance degradation, imposing a substantial computational burden to recover performance during finetuning. In this paper, we propose a novel paradigm that first applies regularization, then prunes, and finally finetunes. Based on this paradigm, we introduce DReSS, a simple and effective Data-driven Regularized Structured Streamlining method for LLMs. By leveraging a small amount of data to regularize the components to be pruned, DReSS explicitly transfers the important information to the remaining parts of the model in advance. Compared to direct pruning, this can reduce the information loss caused by parameter removal, thereby enhancing its language modeling capabilities. Experimental results demonstrate that DReSS significantly outperforms existing pruning methods even under extreme pruning ratios, significantly reducing latency and increasing throughput.

Via

Access Paper or Ask Questions

Is FISHER All You Need in The Multi-AUV Underwater Target Tracking Task?

Dec 05, 2024

Jingzehua Xu, Guanwen Xie, Ziqi Zhang, Xiangwang Hou, Dongfang Ma, Shuai Zhang, Yong Ren, Dusit Niyato

Figure 1 for Is FISHER All You Need in The Multi-AUV Underwater Target Tracking Task?

Figure 2 for Is FISHER All You Need in The Multi-AUV Underwater Target Tracking Task?

Figure 3 for Is FISHER All You Need in The Multi-AUV Underwater Target Tracking Task?

Figure 4 for Is FISHER All You Need in The Multi-AUV Underwater Target Tracking Task?

Abstract:It is significant to employ multiple autonomous underwater vehicles (AUVs) to execute the underwater target tracking task collaboratively. However, it's pretty challenging to meet various prerequisites utilizing traditional control methods. Therefore, we propose an effective two-stage learning from demonstrations training framework, FISHER, to highlight the adaptability of reinforcement learning (RL) methods in the multi-AUV underwater target tracking task, while addressing its limitations such as extensive requirements for environmental interactions and the challenges in designing reward functions. The first stage utilizes imitation learning (IL) to realize policy improvement and generate offline datasets. To be specific, we introduce multi-agent discriminator-actor-critic based on improvements of the generative adversarial IL algorithm and multi-agent IL optimization objective derived from the Nash equilibrium condition. Then in the second stage, we develop multi-agent independent generalized decision transformer, which analyzes the latent representation to match the future states of high-quality samples rather than reward function, attaining further enhanced policies capable of handling various scenarios. Besides, we propose a simulation to simulation demonstration generation procedure to facilitate the generation of expert demonstrations in underwater environments, which capitalizes on traditional control methods and can easily accomplish the domain transfer to obtain demonstrations. Extensive simulation experiments from multiple scenarios showcase that FISHER possesses strong stability, multi-task performance and capability of generalization.

Via

Access Paper or Ask Questions

Beyond Examples: High-level Automated Reasoning Paradigm in In-Context Learning via MCTS

Nov 27, 2024

Jinyang Wu, Mingkuan Feng, Shuai Zhang, Feihu Che, Zengqi Wen, Jianhua Tao

Abstract:In-context Learning (ICL) enables large language models (LLMs) to tackle downstream tasks through sophisticated prompting and high-quality demonstrations. However, this traditional ICL paradigm shows limitations when facing complex mathematical reasoning tasks, primarily due to its heavy dependence on example quality and the necessity for human intervention in challenging scenarios. To address these limitations, this paper presents HiAR-ICL, a \textbf{Hi}gh-level \textbf{A}utomated \textbf{R}easoning paradigm in \textbf{ICL} that shifts focus from specific examples to abstract thinking patterns, extending the conventional concept of context in ICL. HiAR-ICL introduces five atomic reasoning actions as fundamental components for constructing chain-structured patterns. Using Monte Carlo Tree Search, we explore reasoning paths and construct thought cards to guide subsequent inference. We then develop a cognitive complexity framework that dynamically matches problems with appropriate thought cards. Experimental results demonstrate HiAR-ICL's effectiveness, achieving state-of-the-art accuracy (79.6$\%$) on the MATH benchmark with Qwen2.5-7B-Instruct, surpassing GPT-4o (76.6$\%$) and Claude 3.5 (71.1$\%$).

Via

Access Paper or Ask Questions

Multimodal Instruction Tuning with Hybrid State Space Models

Nov 13, 2024

Jianing Zhou, Han Li, Shuai Zhang, Ning Xie, Ruijie Wang, Xiaohan Nie, Sheng Liu, Lingyun Wang

Figure 1 for Multimodal Instruction Tuning with Hybrid State Space Models

Figure 2 for Multimodal Instruction Tuning with Hybrid State Space Models

Figure 3 for Multimodal Instruction Tuning with Hybrid State Space Models

Figure 4 for Multimodal Instruction Tuning with Hybrid State Space Models

Abstract:Handling lengthy context is crucial for enhancing the recognition and understanding capabilities of multimodal large language models (MLLMs) in applications such as processing high-resolution images or high frame rate videos. The rise in image resolution and frame rate substantially increases computational demands due to the increased number of input tokens. This challenge is further exacerbated by the quadratic complexity with respect to sequence length of the self-attention mechanism. Most prior works either pre-train models with long contexts, overlooking the efficiency problem, or attempt to reduce the context length via downsampling (e.g., identify the key image patches or frames) to decrease the context length, which may result in information loss. To circumvent this issue while keeping the remarkable effectiveness of MLLMs, we propose a novel approach using a hybrid transformer-MAMBA model to efficiently handle long contexts in multimodal applications. Our multimodal model can effectively process long context input exceeding 100k tokens, outperforming existing models across various benchmarks. Remarkably, our model enhances inference efficiency for high-resolution images and high-frame-rate videos by about 4 times compared to current models, with efficiency gains increasing as image resolution or video frames rise. Furthermore, our model is the first to be trained on low-resolution images or low-frame-rate videos while being capable of inference on high-resolution images and high-frame-rate videos, offering flexibility for inference in diverse scenarios.

Via

Access Paper or Ask Questions

Unraveling the Gradient Descent Dynamics of Transformers

Nov 12, 2024

Bingqing Song, Boran Han, Shuai Zhang, Jie Ding, Mingyi Hong

Figure 1 for Unraveling the Gradient Descent Dynamics of Transformers

Figure 2 for Unraveling the Gradient Descent Dynamics of Transformers

Figure 3 for Unraveling the Gradient Descent Dynamics of Transformers

Figure 4 for Unraveling the Gradient Descent Dynamics of Transformers

Abstract:While the Transformer architecture has achieved remarkable success across various domains, a thorough theoretical foundation explaining its optimization dynamics is yet to be fully developed. In this study, we aim to bridge this understanding gap by answering the following two core questions: (1) Which types of Transformer architectures allow Gradient Descent (GD) to achieve guaranteed convergence? and (2) Under what initial conditions and architectural specifics does the Transformer achieve rapid convergence during training? By analyzing the loss landscape of a single Transformer layer using Softmax and Gaussian attention kernels, our work provides concrete answers to these questions. Our findings demonstrate that, with appropriate weight initialization, GD can train a Transformer model (with either kernel type) to achieve a global optimal solution, especially when the input embedding dimension is large. Nonetheless, certain scenarios highlight potential pitfalls: training a Transformer using the Softmax attention kernel may sometimes lead to suboptimal local solutions. In contrast, the Gaussian attention kernel exhibits a much favorable behavior. Our empirical study further validate the theoretical findings.

Via

Access Paper or Ask Questions

PalmBench: A Comprehensive Benchmark of Compressed Large Language Models on Mobile Platforms

Oct 05, 2024

Yilong Li, Jingyu Liu, Hao Zhang, M Badri Narayanan, Utkarsh Sharma, Shuai Zhang, Pan Hu, Yijing Zeng, Jayaram Raghuram, Suman Banerjee

Figure 1 for PalmBench: A Comprehensive Benchmark of Compressed Large Language Models on Mobile Platforms

Figure 2 for PalmBench: A Comprehensive Benchmark of Compressed Large Language Models on Mobile Platforms

Figure 3 for PalmBench: A Comprehensive Benchmark of Compressed Large Language Models on Mobile Platforms

Figure 4 for PalmBench: A Comprehensive Benchmark of Compressed Large Language Models on Mobile Platforms

Abstract:Deploying large language models (LLMs) locally on mobile devices is advantageous in scenarios where transmitting data to remote cloud servers is either undesirable due to privacy concerns or impractical due to network connection. Recent advancements (MLC, 2023a; Gerganov, 2023) have facilitated the local deployment of LLMs. However, local deployment also presents challenges, particularly in balancing quality (generative performance), latency, and throughput within the hardware constraints of mobile devices. In this paper, we introduce our lightweight, all-in-one automated benchmarking framework that allows users to evaluate LLMs on mobile devices. We provide a comprehensive benchmark of various popular LLMs with different quantization configurations (both weights and activations) across multiple mobile platforms with varying hardware capabilities. Unlike traditional benchmarks that assess full-scale models on high-end GPU clusters, we focus on evaluating resource efficiency (memory and power consumption) and harmful output for compressed models on mobile devices. Our key observations include i) differences in energy efficiency and throughput across mobile platforms; ii) the impact of quantization on memory usage, GPU execution time, and power consumption; and iii) accuracy and performance degradation of quantized models compared to their non-quantized counterparts; and iv) the frequency of hallucinations and toxic content generated by compressed LLMs on mobile devices.

* 10 pages

Via

Access Paper or Ask Questions

CausalVE: Face Video Privacy Encryption via Causal Video Prediction

Sep 28, 2024

Yubo Huang, Wenhao Feng, Xin Lai, Zixi Wang, Jingzehua Xu, Shuai Zhang, Hongjie He, Fan Chen

Figure 1 for CausalVE: Face Video Privacy Encryption via Causal Video Prediction

Figure 2 for CausalVE: Face Video Privacy Encryption via Causal Video Prediction

Figure 3 for CausalVE: Face Video Privacy Encryption via Causal Video Prediction

Figure 4 for CausalVE: Face Video Privacy Encryption via Causal Video Prediction

Abstract:Advanced facial recognition technologies and recommender systems with inadequate privacy technologies and policies for facial interactions increase concerns about bioprivacy violations. With the proliferation of video and live-streaming websites, public-face video distribution and interactions pose greater privacy risks. Existing techniques typically address the risk of sensitive biometric information leakage through various privacy enhancement methods but pose a higher security risk by corrupting the information to be conveyed by the interaction data, or by leaving certain biometric features intact that allow an attacker to infer sensitive biometric information from them. To address these shortcomings, in this paper, we propose a neural network framework, CausalVE. We obtain cover images by adopting a diffusion model to achieve face swapping with face guidance and use the speech sequence features and spatiotemporal sequence features of the secret video for dynamic video inference and prediction to obtain a cover video with the same number of frames as the secret video. In addition, we hide the secret video by using reversible neural networks for video hiding so that the video can also disseminate secret data. Numerous experiments prove that our CausalVE has good security in public video dissemination and outperforms state-of-the-art methods from a qualitative, quantitative, and visual point of view.

* Submitted to ICLR 2025

Via

Access Paper or Ask Questions

Dynamic 2D Gaussians: Geometrically accurate radiance fields for dynamic objects

Sep 21, 2024

Shuai Zhang, Guanjun Wu, Xinggang Wang, Bin Feng, Wenyu Liu

Figure 1 for Dynamic 2D Gaussians: Geometrically accurate radiance fields for dynamic objects

Figure 2 for Dynamic 2D Gaussians: Geometrically accurate radiance fields for dynamic objects

Figure 3 for Dynamic 2D Gaussians: Geometrically accurate radiance fields for dynamic objects

Figure 4 for Dynamic 2D Gaussians: Geometrically accurate radiance fields for dynamic objects

Abstract:Reconstructing objects and extracting high-quality surfaces play a vital role in the real world. Current 4D representations show the ability to render high-quality novel views for dynamic objects but cannot reconstruct high-quality meshes due to their implicit or geometrically inaccurate representations. In this paper, we propose a novel representation that can reconstruct accurate meshes from sparse image input, named Dynamic 2D Gaussians (D-2DGS). We adopt 2D Gaussians for basic geometry representation and use sparse-controlled points to capture 2D Gaussian's deformation. By extracting the object mask from the rendered high-quality image and masking the rendered depth map, a high-quality dynamic mesh sequence of the object can be extracted. Experiments demonstrate that our D-2DGS is outstanding in reconstructing high-quality meshes from sparse input. More demos and code are available at https://github.com/hustvl/Dynamic-2DGS.

Via

Access Paper or Ask Questions