Picture for Xianzhi Yu

Xianzhi Yu

and Other Contributors

A Simple Linear Patch Revives Layer-Pruned Large Language Models

Add code
May 30, 2025
Viaarxiv icon

Pangu Pro MoE: Mixture of Grouped Experts for Efficient Sparsity

Add code
May 28, 2025
Viaarxiv icon

Faster and Better LLMs via Latency-Aware Test-Time Scaling

Add code
May 26, 2025
Viaarxiv icon

MoESD: Unveil Speculative Decoding's Potential for Accelerating Sparse MoE

Add code
May 26, 2025
Viaarxiv icon

PreMoe: Lightening MoEs on Constrained Memory by Expert Pruning and Retrieval

Add code
May 23, 2025
Viaarxiv icon

L-MTP: Leap Multi-Token Prediction Beyond Adjacent Context for Large Language Models

Add code
May 23, 2025
Viaarxiv icon

TrimR: Verifier-based Training-Free Thinking Compression for Efficient Test-Time Scaling

Add code
May 22, 2025
Viaarxiv icon

Pangu Ultra MoE: How to Train Your Big MoE on Ascend NPUs

Add code
May 07, 2025
Viaarxiv icon

Quantization Hurts Reasoning? An Empirical Study on Quantized Reasoning Models

Add code
Apr 07, 2025
Viaarxiv icon

CMoE: Fast Carving of Mixture-of-Experts for Efficient LLM Inference

Add code
Feb 06, 2025
Viaarxiv icon