Picture for Jingwen Leng

Jingwen Leng

Yggdrasil: Bridging Dynamic Speculation and Static Runtime for Latency-Optimal Tree-Based LLM Decoding

Add code
Dec 29, 2025
Viaarxiv icon

TIMERIPPLE: Accelerating vDiTs by Understanding the Spatio-Temporal Correlations in Latent Space

Add code
Nov 15, 2025
Figure 1 for TIMERIPPLE: Accelerating vDiTs by Understanding the Spatio-Temporal Correlations in Latent Space
Figure 2 for TIMERIPPLE: Accelerating vDiTs by Understanding the Spatio-Temporal Correlations in Latent Space
Figure 3 for TIMERIPPLE: Accelerating vDiTs by Understanding the Spatio-Temporal Correlations in Latent Space
Figure 4 for TIMERIPPLE: Accelerating vDiTs by Understanding the Spatio-Temporal Correlations in Latent Space
Viaarxiv icon

ClusterFusion: Expanding Operator Fusion Scope for LLM Inference via Cluster-Level Collective Primitive

Add code
Aug 26, 2025
Figure 1 for ClusterFusion: Expanding Operator Fusion Scope for LLM Inference via Cluster-Level Collective Primitive
Figure 2 for ClusterFusion: Expanding Operator Fusion Scope for LLM Inference via Cluster-Level Collective Primitive
Figure 3 for ClusterFusion: Expanding Operator Fusion Scope for LLM Inference via Cluster-Level Collective Primitive
Figure 4 for ClusterFusion: Expanding Operator Fusion Scope for LLM Inference via Cluster-Level Collective Primitive
Viaarxiv icon

Astraea: A GPU-Oriented Token-wise Acceleration Framework for Video Diffusion Transformers

Add code
Jun 06, 2025
Figure 1 for Astraea: A GPU-Oriented Token-wise Acceleration Framework for Video Diffusion Transformers
Figure 2 for Astraea: A GPU-Oriented Token-wise Acceleration Framework for Video Diffusion Transformers
Figure 3 for Astraea: A GPU-Oriented Token-wise Acceleration Framework for Video Diffusion Transformers
Figure 4 for Astraea: A GPU-Oriented Token-wise Acceleration Framework for Video Diffusion Transformers
Viaarxiv icon

An Efficient Private GPT Never Autoregressively Decodes

Add code
May 21, 2025
Viaarxiv icon

WeightedKV: Attention Scores Weighted Key-Value Cache Merging for Large Language Models

Add code
Mar 03, 2025
Figure 1 for WeightedKV: Attention Scores Weighted Key-Value Cache Merging for Large Language Models
Figure 2 for WeightedKV: Attention Scores Weighted Key-Value Cache Merging for Large Language Models
Figure 3 for WeightedKV: Attention Scores Weighted Key-Value Cache Merging for Large Language Models
Figure 4 for WeightedKV: Attention Scores Weighted Key-Value Cache Merging for Large Language Models
Viaarxiv icon

TreeKV: Smooth Key-Value Cache Compression with Tree Structures

Add code
Jan 09, 2025
Figure 1 for TreeKV: Smooth Key-Value Cache Compression with Tree Structures
Figure 2 for TreeKV: Smooth Key-Value Cache Compression with Tree Structures
Figure 3 for TreeKV: Smooth Key-Value Cache Compression with Tree Structures
Figure 4 for TreeKV: Smooth Key-Value Cache Compression with Tree Structures
Viaarxiv icon

Nimbus: Secure and Efficient Two-Party Inference for Transformers

Add code
Nov 24, 2024
Figure 1 for Nimbus: Secure and Efficient Two-Party Inference for Transformers
Figure 2 for Nimbus: Secure and Efficient Two-Party Inference for Transformers
Figure 3 for Nimbus: Secure and Efficient Two-Party Inference for Transformers
Figure 4 for Nimbus: Secure and Efficient Two-Party Inference for Transformers
Viaarxiv icon

vTensor: Flexible Virtual Tensor Management for Efficient LLM Serving

Add code
Jul 22, 2024
Figure 1 for vTensor: Flexible Virtual Tensor Management for Efficient LLM Serving
Figure 2 for vTensor: Flexible Virtual Tensor Management for Efficient LLM Serving
Figure 3 for vTensor: Flexible Virtual Tensor Management for Efficient LLM Serving
Figure 4 for vTensor: Flexible Virtual Tensor Management for Efficient LLM Serving
Viaarxiv icon

Fovea Transformer: Efficient Long-Context Modeling with Structured Fine-to-Coarse Attention

Add code
Nov 13, 2023
Viaarxiv icon