Picture for Haojie Duanmu

Haojie Duanmu

MxMoE: Mixed-precision Quantization for MoE with Accuracy and Performance Co-Design

Add code
May 09, 2025
Figure 1 for MxMoE: Mixed-precision Quantization for MoE with Accuracy and Performance Co-Design
Figure 2 for MxMoE: Mixed-precision Quantization for MoE with Accuracy and Performance Co-Design
Figure 3 for MxMoE: Mixed-precision Quantization for MoE with Accuracy and Performance Co-Design
Figure 4 for MxMoE: Mixed-precision Quantization for MoE with Accuracy and Performance Co-Design
Viaarxiv icon

SKVQ: Sliding-window Key and Value Cache Quantization for Large Language Models

Add code
May 10, 2024
Figure 1 for SKVQ: Sliding-window Key and Value Cache Quantization for Large Language Models
Figure 2 for SKVQ: Sliding-window Key and Value Cache Quantization for Large Language Models
Figure 3 for SKVQ: Sliding-window Key and Value Cache Quantization for Large Language Models
Figure 4 for SKVQ: Sliding-window Key and Value Cache Quantization for Large Language Models
Viaarxiv icon

WKVQuant: Quantizing Weight and Key/Value Cache for Large Language Models Gains More

Add code
Feb 20, 2024
Viaarxiv icon