Picture for Shang Yang

Shang Yang

DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads

Add code
Oct 14, 2024
Figure 1 for DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads
Figure 2 for DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads
Figure 3 for DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads
Figure 4 for DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads
Viaarxiv icon

HART: Efficient Visual Generation with Hybrid Autoregressive Transformer

Add code
Oct 14, 2024
Figure 1 for HART: Efficient Visual Generation with Hybrid Autoregressive Transformer
Figure 2 for HART: Efficient Visual Generation with Hybrid Autoregressive Transformer
Figure 3 for HART: Efficient Visual Generation with Hybrid Autoregressive Transformer
Figure 4 for HART: Efficient Visual Generation with Hybrid Autoregressive Transformer
Viaarxiv icon

Deep Compression Autoencoder for Efficient High-Resolution Diffusion Models

Add code
Oct 14, 2024
Figure 1 for Deep Compression Autoencoder for Efficient High-Resolution Diffusion Models
Figure 2 for Deep Compression Autoencoder for Efficient High-Resolution Diffusion Models
Figure 3 for Deep Compression Autoencoder for Efficient High-Resolution Diffusion Models
Figure 4 for Deep Compression Autoencoder for Efficient High-Resolution Diffusion Models
Viaarxiv icon

LongVILA: Scaling Long-Context Visual Language Models for Long Videos

Add code
Aug 21, 2024
Figure 1 for LongVILA: Scaling Long-Context Visual Language Models for Long Videos
Figure 2 for LongVILA: Scaling Long-Context Visual Language Models for Long Videos
Figure 3 for LongVILA: Scaling Long-Context Visual Language Models for Long Videos
Figure 4 for LongVILA: Scaling Long-Context Visual Language Models for Long Videos
Viaarxiv icon

Sparse Refinement for Efficient High-Resolution Semantic Segmentation

Add code
Jul 26, 2024
Viaarxiv icon

QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving

Add code
May 07, 2024
Viaarxiv icon

AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

Add code
Jun 01, 2023
Viaarxiv icon

FlatFormer: Flattened Window Attention for Efficient Point Cloud Transformer

Add code
Jan 20, 2023
Viaarxiv icon