Picture for Zhenhua Han

Zhenhua Han

RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval

Add code
Sep 16, 2024
Viaarxiv icon

MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention

Add code
Jul 02, 2024
Figure 1 for MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention
Figure 2 for MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention
Figure 3 for MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention
Figure 4 for MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention
Viaarxiv icon

Parrot: Efficient Serving of LLM-based Applications with Semantic Variable

Add code
May 30, 2024
Viaarxiv icon

Dissecting Arbitrary-scale Super-resolution Capability from Pre-trained Diffusion Generative Models

Add code
Jun 01, 2023
Viaarxiv icon

Online Video Streaming Super-Resolution with Adaptive Look-Up Table Fusion

Add code
Mar 01, 2023
Viaarxiv icon

SparDA: Accelerating Dynamic Sparse Deep Neural Networks via Sparse-Dense Transformation

Add code
Jan 26, 2023
Viaarxiv icon