Picture for Xuefei Ning

Xuefei Ning

Efficient Expert Pruning for Sparse Mixture-of-Experts Language Models: Enhancing Performance and Reducing Inference Costs

Add code
Jul 01, 2024
Viaarxiv icon

MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression

Add code
Jun 21, 2024
Viaarxiv icon

Can LLMs Learn by Teaching? A Preliminary Study

Add code
Jun 20, 2024
Viaarxiv icon

DiTFastAttn: Attention Compression for Diffusion Transformer Models

Add code
Jun 12, 2024
Viaarxiv icon

ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generation

Add code
Jun 04, 2024
Figure 1 for ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generation
Figure 2 for ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generation
Figure 3 for ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generation
Figure 4 for ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generation
Viaarxiv icon

MixDQ: Memory-Efficient Few-Step Text-to-Image Diffusion Models with Metric-Decoupled Mixed Precision Quantization

Add code
May 30, 2024
Viaarxiv icon

HetHub: A Heterogeneous distributed hybrid training system for large-scale models

Add code
May 25, 2024
Viaarxiv icon

DiM: Diffusion Mamba for Efficient High-Resolution Image Synthesis

Add code
May 23, 2024
Viaarxiv icon

A Survey on Efficient Inference for Large Language Models

Add code
Apr 22, 2024
Viaarxiv icon

Linear Combination of Saved Checkpoints Makes Consistency and Diffusion Models Better

Add code
Apr 08, 2024
Viaarxiv icon