Picture for Zhefeng Wang

Zhefeng Wang

Discovering Decoupled Functional Modules in Large Language Models

Add code
Mar 18, 2026
Viaarxiv icon

Adacc: Adaptive Compression and Activation Checkpointing for LLM Memory Management

Add code
Aug 01, 2025
Viaarxiv icon

Alignment-Augmented Speculative Decoding with Alignment Sampling and Conditional Verification

Add code
May 19, 2025
Figure 1 for Alignment-Augmented Speculative Decoding with Alignment Sampling and Conditional Verification
Figure 2 for Alignment-Augmented Speculative Decoding with Alignment Sampling and Conditional Verification
Figure 3 for Alignment-Augmented Speculative Decoding with Alignment Sampling and Conditional Verification
Figure 4 for Alignment-Augmented Speculative Decoding with Alignment Sampling and Conditional Verification
Viaarxiv icon

Accurate KV Cache Quantization with Outlier Tokens Tracing

Add code
May 16, 2025
Figure 1 for Accurate KV Cache Quantization with Outlier Tokens Tracing
Figure 2 for Accurate KV Cache Quantization with Outlier Tokens Tracing
Figure 3 for Accurate KV Cache Quantization with Outlier Tokens Tracing
Figure 4 for Accurate KV Cache Quantization with Outlier Tokens Tracing
Viaarxiv icon

Taming the Titans: A Survey of Efficient LLM Inference Serving

Add code
Apr 28, 2025
Viaarxiv icon

FASP: Fast and Accurate Structured Pruning of Large Language Models

Add code
Jan 16, 2025
Figure 1 for FASP: Fast and Accurate Structured Pruning of Large Language Models
Figure 2 for FASP: Fast and Accurate Structured Pruning of Large Language Models
Figure 3 for FASP: Fast and Accurate Structured Pruning of Large Language Models
Figure 4 for FASP: Fast and Accurate Structured Pruning of Large Language Models
Viaarxiv icon

Beware of Calibration Data for Pruning Large Language Models

Add code
Oct 23, 2024
Figure 1 for Beware of Calibration Data for Pruning Large Language Models
Figure 2 for Beware of Calibration Data for Pruning Large Language Models
Figure 3 for Beware of Calibration Data for Pruning Large Language Models
Figure 4 for Beware of Calibration Data for Pruning Large Language Models
Viaarxiv icon

A Convex-optimization-based Layer-wise Post-training Pruner for Large Language Models

Add code
Aug 07, 2024
Figure 1 for A Convex-optimization-based Layer-wise Post-training Pruner for Large Language Models
Figure 2 for A Convex-optimization-based Layer-wise Post-training Pruner for Large Language Models
Figure 3 for A Convex-optimization-based Layer-wise Post-training Pruner for Large Language Models
Figure 4 for A Convex-optimization-based Layer-wise Post-training Pruner for Large Language Models
Viaarxiv icon

MSceneSpeech: A Multi-Scene Speech Dataset For Expressive Speech Synthesis

Add code
Jul 19, 2024
Figure 1 for MSceneSpeech: A Multi-Scene Speech Dataset For Expressive Speech Synthesis
Figure 2 for MSceneSpeech: A Multi-Scene Speech Dataset For Expressive Speech Synthesis
Figure 3 for MSceneSpeech: A Multi-Scene Speech Dataset For Expressive Speech Synthesis
Figure 4 for MSceneSpeech: A Multi-Scene Speech Dataset For Expressive Speech Synthesis
Viaarxiv icon

OPT-Tree: Speculative Decoding with Adaptive Draft Tree Structure

Add code
Jun 25, 2024
Figure 1 for OPT-Tree: Speculative Decoding with Adaptive Draft Tree Structure
Figure 2 for OPT-Tree: Speculative Decoding with Adaptive Draft Tree Structure
Figure 3 for OPT-Tree: Speculative Decoding with Adaptive Draft Tree Structure
Figure 4 for OPT-Tree: Speculative Decoding with Adaptive Draft Tree Structure
Viaarxiv icon