Picture for Xianzhi Yu

Xianzhi Yu

and Other Contributors

What Makes Low-Bit Quantization-Aware Training Work for Reasoning LLMs? A Systematic Study

Add code
Jan 21, 2026
Viaarxiv icon

Benchmarking Post-Training Quantization of Large Language Models under Microscaling Floating Point Formats

Add code
Jan 14, 2026
Viaarxiv icon

SwiftMem: Fast Agentic Memory via Query-aware Indexing

Add code
Jan 13, 2026
Viaarxiv icon

Revisiting Judge Decoding from First Principles via Training-Free Distributional Divergence

Add code
Jan 08, 2026
Viaarxiv icon

What Matters For Safety Alignment?

Add code
Jan 07, 2026
Viaarxiv icon

Towards Efficient Agents: A Co-Design of Inference Architecture and System

Add code
Dec 20, 2025
Viaarxiv icon

EAQuant: Enhancing Post-Training Quantization for MoE Models via Expert-Aware Optimization

Add code
Jun 16, 2025
Viaarxiv icon

A Simple Linear Patch Revives Layer-Pruned Large Language Models

Add code
May 30, 2025
Figure 1 for A Simple Linear Patch Revives Layer-Pruned Large Language Models
Figure 2 for A Simple Linear Patch Revives Layer-Pruned Large Language Models
Figure 3 for A Simple Linear Patch Revives Layer-Pruned Large Language Models
Figure 4 for A Simple Linear Patch Revives Layer-Pruned Large Language Models
Viaarxiv icon

Pangu Pro MoE: Mixture of Grouped Experts for Efficient Sparsity

Add code
May 28, 2025
Viaarxiv icon

Faster and Better LLMs via Latency-Aware Test-Time Scaling

Add code
May 26, 2025
Viaarxiv icon