Picture for Haojie Duanmu

Haojie Duanmu

6Bit-Diffusion: Inference-Time Mixed-Precision Quantization for Video Diffusion Models

Add code
Mar 19, 2026
Viaarxiv icon

MxMoE: Mixed-precision Quantization for MoE with Accuracy and Performance Co-Design

Add code
May 09, 2025
Figure 1 for MxMoE: Mixed-precision Quantization for MoE with Accuracy and Performance Co-Design
Figure 2 for MxMoE: Mixed-precision Quantization for MoE with Accuracy and Performance Co-Design
Figure 3 for MxMoE: Mixed-precision Quantization for MoE with Accuracy and Performance Co-Design
Figure 4 for MxMoE: Mixed-precision Quantization for MoE with Accuracy and Performance Co-Design
Viaarxiv icon

SKVQ: Sliding-window Key and Value Cache Quantization for Large Language Models

Add code
May 10, 2024
Figure 1 for SKVQ: Sliding-window Key and Value Cache Quantization for Large Language Models
Figure 2 for SKVQ: Sliding-window Key and Value Cache Quantization for Large Language Models
Figure 3 for SKVQ: Sliding-window Key and Value Cache Quantization for Large Language Models
Figure 4 for SKVQ: Sliding-window Key and Value Cache Quantization for Large Language Models
Viaarxiv icon

WKVQuant: Quantizing Weight and Key/Value Cache for Large Language Models Gains More

Add code
Feb 20, 2024
Viaarxiv icon