Picture for Xianzhi Yu

Xianzhi Yu

and Other Contributors

EAQuant: Enhancing Post-Training Quantization for MoE Models via Expert-Aware Optimization

Add code
Jun 16, 2025
Viaarxiv icon

A Simple Linear Patch Revives Layer-Pruned Large Language Models

Add code
May 30, 2025
Figure 1 for A Simple Linear Patch Revives Layer-Pruned Large Language Models
Figure 2 for A Simple Linear Patch Revives Layer-Pruned Large Language Models
Figure 3 for A Simple Linear Patch Revives Layer-Pruned Large Language Models
Figure 4 for A Simple Linear Patch Revives Layer-Pruned Large Language Models
Viaarxiv icon

Pangu Pro MoE: Mixture of Grouped Experts for Efficient Sparsity

Add code
May 28, 2025
Viaarxiv icon

MoESD: Unveil Speculative Decoding's Potential for Accelerating Sparse MoE

Add code
May 26, 2025
Viaarxiv icon

Faster and Better LLMs via Latency-Aware Test-Time Scaling

Add code
May 26, 2025
Viaarxiv icon

PreMoe: Lightening MoEs on Constrained Memory by Expert Pruning and Retrieval

Add code
May 23, 2025
Figure 1 for PreMoe: Lightening MoEs on Constrained Memory by Expert Pruning and Retrieval
Figure 2 for PreMoe: Lightening MoEs on Constrained Memory by Expert Pruning and Retrieval
Figure 3 for PreMoe: Lightening MoEs on Constrained Memory by Expert Pruning and Retrieval
Figure 4 for PreMoe: Lightening MoEs on Constrained Memory by Expert Pruning and Retrieval
Viaarxiv icon

L-MTP: Leap Multi-Token Prediction Beyond Adjacent Context for Large Language Models

Add code
May 23, 2025
Viaarxiv icon

TrimR: Verifier-based Training-Free Thinking Compression for Efficient Test-Time Scaling

Add code
May 22, 2025
Figure 1 for TrimR: Verifier-based Training-Free Thinking Compression for Efficient Test-Time Scaling
Figure 2 for TrimR: Verifier-based Training-Free Thinking Compression for Efficient Test-Time Scaling
Figure 3 for TrimR: Verifier-based Training-Free Thinking Compression for Efficient Test-Time Scaling
Figure 4 for TrimR: Verifier-based Training-Free Thinking Compression for Efficient Test-Time Scaling
Viaarxiv icon

Pangu Ultra MoE: How to Train Your Big MoE on Ascend NPUs

Add code
May 07, 2025
Figure 1 for Pangu Ultra MoE: How to Train Your Big MoE on Ascend NPUs
Figure 2 for Pangu Ultra MoE: How to Train Your Big MoE on Ascend NPUs
Figure 3 for Pangu Ultra MoE: How to Train Your Big MoE on Ascend NPUs
Figure 4 for Pangu Ultra MoE: How to Train Your Big MoE on Ascend NPUs
Viaarxiv icon

Quantization Hurts Reasoning? An Empirical Study on Quantized Reasoning Models

Add code
Apr 07, 2025
Viaarxiv icon