Picture for Shengen Yan

Shengen Yan

Efficient Expert Pruning for Sparse Mixture-of-Experts Language Models: Enhancing Performance and Reducing Inference Costs

Add code
Jul 01, 2024
Viaarxiv icon

MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression

Add code
Jun 21, 2024
Viaarxiv icon

DiTFastAttn: Attention Compression for Diffusion Transformer Models

Add code
Jun 12, 2024
Viaarxiv icon

ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generation

Add code
Jun 04, 2024
Figure 1 for ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generation
Figure 2 for ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generation
Figure 3 for ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generation
Figure 4 for ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generation
Viaarxiv icon

MixDQ: Memory-Efficient Few-Step Text-to-Image Diffusion Models with Metric-Decoupled Mixed Precision Quantization

Add code
May 30, 2024
Viaarxiv icon

HetHub: A Heterogeneous distributed hybrid training system for large-scale models

Add code
May 25, 2024
Viaarxiv icon

A Survey on Efficient Inference for Large Language Models

Add code
Apr 22, 2024
Viaarxiv icon

Linear Combination of Saved Checkpoints Makes Consistency and Diffusion Models Better

Add code
Apr 08, 2024
Viaarxiv icon

Evaluating Quantized Large Language Models

Add code
Feb 28, 2024
Figure 1 for Evaluating Quantized Large Language Models
Figure 2 for Evaluating Quantized Large Language Models
Figure 3 for Evaluating Quantized Large Language Models
Figure 4 for Evaluating Quantized Large Language Models
Viaarxiv icon

LV-Eval: A Balanced Long-Context Benchmark with 5 Length Levels Up to 256K

Add code
Feb 06, 2024
Viaarxiv icon