Picture for Jingwen Leng

Jingwen Leng

Astraea: A GPU-Oriented Token-wise Acceleration Framework for Video Diffusion Transformers

Add code
Jun 06, 2025
Viaarxiv icon

An Efficient Private GPT Never Autoregressively Decodes

Add code
May 21, 2025
Viaarxiv icon

WeightedKV: Attention Scores Weighted Key-Value Cache Merging for Large Language Models

Add code
Mar 03, 2025
Viaarxiv icon

TreeKV: Smooth Key-Value Cache Compression with Tree Structures

Add code
Jan 09, 2025
Viaarxiv icon

Nimbus: Secure and Efficient Two-Party Inference for Transformers

Add code
Nov 24, 2024
Figure 1 for Nimbus: Secure and Efficient Two-Party Inference for Transformers
Figure 2 for Nimbus: Secure and Efficient Two-Party Inference for Transformers
Figure 3 for Nimbus: Secure and Efficient Two-Party Inference for Transformers
Figure 4 for Nimbus: Secure and Efficient Two-Party Inference for Transformers
Viaarxiv icon

vTensor: Flexible Virtual Tensor Management for Efficient LLM Serving

Add code
Jul 22, 2024
Figure 1 for vTensor: Flexible Virtual Tensor Management for Efficient LLM Serving
Figure 2 for vTensor: Flexible Virtual Tensor Management for Efficient LLM Serving
Figure 3 for vTensor: Flexible Virtual Tensor Management for Efficient LLM Serving
Figure 4 for vTensor: Flexible Virtual Tensor Management for Efficient LLM Serving
Viaarxiv icon

Fovea Transformer: Efficient Long-Context Modeling with Structured Fine-to-Coarse Attention

Add code
Nov 13, 2023
Viaarxiv icon

Accelerating Generic Graph Neural Networks via Architecture, Compiler, Partition Method Co-Design

Add code
Aug 16, 2023
Viaarxiv icon

AdaptGear: Accelerating GNN Training via Adaptive Subgraph-Level Kernels on GPUs

Add code
May 27, 2023
Viaarxiv icon

Fourier Transformer: Fast Long Range Modeling by Removing Sequence Redundancy with FFT Operator

Add code
May 24, 2023
Viaarxiv icon