Picture for Pengfei Zuo

Pengfei Zuo

Serving Large Language Models on Huawei CloudMatrix384

Add code
Jun 15, 2025
Figure 1 for Serving Large Language Models on Huawei CloudMatrix384
Figure 2 for Serving Large Language Models on Huawei CloudMatrix384
Figure 3 for Serving Large Language Models on Huawei CloudMatrix384
Figure 4 for Serving Large Language Models on Huawei CloudMatrix384
Viaarxiv icon

Efficient Unified Caching for Accelerating Heterogeneous AI Workloads

Add code
Jun 14, 2025
Viaarxiv icon

Injecting Adrenaline into LLM Serving: Boosting Resource Utilization and Throughput via Attention Disaggregation

Add code
Mar 26, 2025
Viaarxiv icon

AdaSkip: Adaptive Sublayer Skipping for Accelerating Long-Context LLM Inference

Add code
Jan 04, 2025
Figure 1 for AdaSkip: Adaptive Sublayer Skipping for Accelerating Long-Context LLM Inference
Figure 2 for AdaSkip: Adaptive Sublayer Skipping for Accelerating Long-Context LLM Inference
Figure 3 for AdaSkip: Adaptive Sublayer Skipping for Accelerating Long-Context LLM Inference
Figure 4 for AdaSkip: Adaptive Sublayer Skipping for Accelerating Long-Context LLM Inference
Viaarxiv icon

AttentionStore: Cost-effective Attention Reuse across Multi-turn Conversations in Large Language Model Serving

Add code
Mar 23, 2024
Viaarxiv icon

A Scalable Learned Index Scheme in Storage Systems

Add code
May 08, 2019
Figure 1 for A Scalable Learned Index Scheme in Storage Systems
Figure 2 for A Scalable Learned Index Scheme in Storage Systems
Figure 3 for A Scalable Learned Index Scheme in Storage Systems
Figure 4 for A Scalable Learned Index Scheme in Storage Systems
Viaarxiv icon