Picture for Shengen Yan

Shengen Yan

CSKV: Training-Efficient Channel Shrinking for KV Cache in Long-Context Scenarios

Add code
Sep 16, 2024
Viaarxiv icon

Efficient Expert Pruning for Sparse Mixture-of-Experts Language Models: Enhancing Performance and Reducing Inference Costs

Add code
Jul 01, 2024
Viaarxiv icon

MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression

Add code
Jun 21, 2024
Viaarxiv icon

DiTFastAttn: Attention Compression for Diffusion Transformer Models

Add code
Jun 12, 2024
Viaarxiv icon

ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generation

Add code
Jun 04, 2024
Viaarxiv icon

MixDQ: Memory-Efficient Few-Step Text-to-Image Diffusion Models with Metric-Decoupled Mixed Precision Quantization

Add code
May 30, 2024
Viaarxiv icon

HetHub: A Heterogeneous distributed hybrid training system for large-scale models

Add code
May 25, 2024
Viaarxiv icon

A Survey on Efficient Inference for Large Language Models

Add code
Apr 22, 2024
Viaarxiv icon

Linear Combination of Saved Checkpoints Makes Consistency and Diffusion Models Better

Add code
Apr 08, 2024
Figure 1 for Linear Combination of Saved Checkpoints Makes Consistency and Diffusion Models Better
Figure 2 for Linear Combination of Saved Checkpoints Makes Consistency and Diffusion Models Better
Figure 3 for Linear Combination of Saved Checkpoints Makes Consistency and Diffusion Models Better
Figure 4 for Linear Combination of Saved Checkpoints Makes Consistency and Diffusion Models Better
Viaarxiv icon

Evaluating Quantized Large Language Models

Add code
Feb 28, 2024
Viaarxiv icon