Picture for Zhefeng Wang

Zhefeng Wang

Adacc: Adaptive Compression and Activation Checkpointing for LLM Memory Management

Add code
Aug 01, 2025
Viaarxiv icon

Alignment-Augmented Speculative Decoding with Alignment Sampling and Conditional Verification

Add code
May 19, 2025
Figure 1 for Alignment-Augmented Speculative Decoding with Alignment Sampling and Conditional Verification
Figure 2 for Alignment-Augmented Speculative Decoding with Alignment Sampling and Conditional Verification
Figure 3 for Alignment-Augmented Speculative Decoding with Alignment Sampling and Conditional Verification
Figure 4 for Alignment-Augmented Speculative Decoding with Alignment Sampling and Conditional Verification
Viaarxiv icon

Accurate KV Cache Quantization with Outlier Tokens Tracing

Add code
May 16, 2025
Viaarxiv icon

Taming the Titans: A Survey of Efficient LLM Inference Serving

Add code
Apr 28, 2025
Viaarxiv icon

FASP: Fast and Accurate Structured Pruning of Large Language Models

Add code
Jan 16, 2025
Figure 1 for FASP: Fast and Accurate Structured Pruning of Large Language Models
Figure 2 for FASP: Fast and Accurate Structured Pruning of Large Language Models
Figure 3 for FASP: Fast and Accurate Structured Pruning of Large Language Models
Figure 4 for FASP: Fast and Accurate Structured Pruning of Large Language Models
Viaarxiv icon

Beware of Calibration Data for Pruning Large Language Models

Add code
Oct 23, 2024
Figure 1 for Beware of Calibration Data for Pruning Large Language Models
Figure 2 for Beware of Calibration Data for Pruning Large Language Models
Figure 3 for Beware of Calibration Data for Pruning Large Language Models
Figure 4 for Beware of Calibration Data for Pruning Large Language Models
Viaarxiv icon

A Convex-optimization-based Layer-wise Post-training Pruner for Large Language Models

Add code
Aug 07, 2024
Figure 1 for A Convex-optimization-based Layer-wise Post-training Pruner for Large Language Models
Figure 2 for A Convex-optimization-based Layer-wise Post-training Pruner for Large Language Models
Figure 3 for A Convex-optimization-based Layer-wise Post-training Pruner for Large Language Models
Figure 4 for A Convex-optimization-based Layer-wise Post-training Pruner for Large Language Models
Viaarxiv icon

MSceneSpeech: A Multi-Scene Speech Dataset For Expressive Speech Synthesis

Add code
Jul 19, 2024
Figure 1 for MSceneSpeech: A Multi-Scene Speech Dataset For Expressive Speech Synthesis
Figure 2 for MSceneSpeech: A Multi-Scene Speech Dataset For Expressive Speech Synthesis
Figure 3 for MSceneSpeech: A Multi-Scene Speech Dataset For Expressive Speech Synthesis
Figure 4 for MSceneSpeech: A Multi-Scene Speech Dataset For Expressive Speech Synthesis
Viaarxiv icon

OPT-Tree: Speculative Decoding with Adaptive Draft Tree Structure

Add code
Jun 25, 2024
Figure 1 for OPT-Tree: Speculative Decoding with Adaptive Draft Tree Structure
Figure 2 for OPT-Tree: Speculative Decoding with Adaptive Draft Tree Structure
Figure 3 for OPT-Tree: Speculative Decoding with Adaptive Draft Tree Structure
Figure 4 for OPT-Tree: Speculative Decoding with Adaptive Draft Tree Structure
Viaarxiv icon

Optimizing Large Model Training through Overlapped Activation Recomputation

Add code
Jun 13, 2024
Figure 1 for Optimizing Large Model Training through Overlapped Activation Recomputation
Figure 2 for Optimizing Large Model Training through Overlapped Activation Recomputation
Figure 3 for Optimizing Large Model Training through Overlapped Activation Recomputation
Figure 4 for Optimizing Large Model Training through Overlapped Activation Recomputation
Viaarxiv icon