Picture for Yilong Zhao

Yilong Zhao

LEANN: A Low-Storage Vector Index

Add code
Jun 09, 2025
Viaarxiv icon

Sparse VideoGen2: Accelerate Video Generation with Sparse Attention via Semantic-Aware Permutation

Add code
May 24, 2025
Viaarxiv icon

Sparse VideoGen: Accelerating Video Diffusion Transformers with Spatial-Temporal Sparsity

Add code
Feb 03, 2025
Figure 1 for Sparse VideoGen: Accelerating Video Diffusion Transformers with Spatial-Temporal Sparsity
Figure 2 for Sparse VideoGen: Accelerating Video Diffusion Transformers with Spatial-Temporal Sparsity
Figure 3 for Sparse VideoGen: Accelerating Video Diffusion Transformers with Spatial-Temporal Sparsity
Figure 4 for Sparse VideoGen: Accelerating Video Diffusion Transformers with Spatial-Temporal Sparsity
Viaarxiv icon

BlendServe: Optimizing Offline Inference for Auto-regressive Large Models with Resource-aware Batching

Add code
Nov 25, 2024
Figure 1 for BlendServe: Optimizing Offline Inference for Auto-regressive Large Models with Resource-aware Batching
Figure 2 for BlendServe: Optimizing Offline Inference for Auto-regressive Large Models with Resource-aware Batching
Figure 3 for BlendServe: Optimizing Offline Inference for Auto-regressive Large Models with Resource-aware Batching
Figure 4 for BlendServe: Optimizing Offline Inference for Auto-regressive Large Models with Resource-aware Batching
Viaarxiv icon

XGrammar: Flexible and Efficient Structured Generation Engine for Large Language Models

Add code
Nov 22, 2024
Figure 1 for XGrammar: Flexible and Efficient Structured Generation Engine for Large Language Models
Figure 2 for XGrammar: Flexible and Efficient Structured Generation Engine for Large Language Models
Figure 3 for XGrammar: Flexible and Efficient Structured Generation Engine for Large Language Models
Figure 4 for XGrammar: Flexible and Efficient Structured Generation Engine for Large Language Models
Viaarxiv icon

A Large Language Model-based Framework for Semi-Structured Tender Document Retrieval-Augmented Generation

Add code
Oct 04, 2024
Figure 1 for A Large Language Model-based Framework for Semi-Structured Tender Document Retrieval-Augmented Generation
Figure 2 for A Large Language Model-based Framework for Semi-Structured Tender Document Retrieval-Augmented Generation
Figure 3 for A Large Language Model-based Framework for Semi-Structured Tender Document Retrieval-Augmented Generation
Figure 4 for A Large Language Model-based Framework for Semi-Structured Tender Document Retrieval-Augmented Generation
Viaarxiv icon

Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference

Add code
Jun 16, 2024
Figure 1 for Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference
Figure 2 for Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference
Figure 3 for Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference
Figure 4 for Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference
Viaarxiv icon

Atom: Low-bit Quantization for Efficient and Accurate LLM Serving

Add code
Nov 07, 2023
Figure 1 for Atom: Low-bit Quantization for Efficient and Accurate LLM Serving
Figure 2 for Atom: Low-bit Quantization for Efficient and Accurate LLM Serving
Figure 3 for Atom: Low-bit Quantization for Efficient and Accurate LLM Serving
Figure 4 for Atom: Low-bit Quantization for Efficient and Accurate LLM Serving
Viaarxiv icon

Neural-PIM: Efficient Processing-In-Memory with Neural Approximation of Peripherals

Add code
Jan 30, 2022
Figure 1 for Neural-PIM: Efficient Processing-In-Memory with Neural Approximation of Peripherals
Figure 2 for Neural-PIM: Efficient Processing-In-Memory with Neural Approximation of Peripherals
Figure 3 for Neural-PIM: Efficient Processing-In-Memory with Neural Approximation of Peripherals
Figure 4 for Neural-PIM: Efficient Processing-In-Memory with Neural Approximation of Peripherals
Viaarxiv icon

SME: ReRAM-based Sparse-Multiplication-Engine to Squeeze-Out Bit Sparsity of Neural Network

Add code
Mar 02, 2021
Figure 1 for SME: ReRAM-based Sparse-Multiplication-Engine to Squeeze-Out Bit Sparsity of Neural Network
Figure 2 for SME: ReRAM-based Sparse-Multiplication-Engine to Squeeze-Out Bit Sparsity of Neural Network
Figure 3 for SME: ReRAM-based Sparse-Multiplication-Engine to Squeeze-Out Bit Sparsity of Neural Network
Figure 4 for SME: ReRAM-based Sparse-Multiplication-Engine to Squeeze-Out Bit Sparsity of Neural Network
Viaarxiv icon