Picture for Haojie Duanmu

Haojie Duanmu

MxMoE: Mixed-precision Quantization for MoE with Accuracy and Performance Co-Design

Add code
May 09, 2025
Viaarxiv icon

SKVQ: Sliding-window Key and Value Cache Quantization for Large Language Models

Add code
May 10, 2024
Viaarxiv icon

WKVQuant: Quantizing Weight and Key/Value Cache for Large Language Models Gains More

Add code
Feb 20, 2024
Viaarxiv icon