Picture for Yao Fu

Yao Fu

When Truthful Representations Flip Under Deceptive Instructions?

Add code
Jul 29, 2025
Viaarxiv icon

FAEDKV: Infinite-Window Fourier Transform for Unbiased KV Cache Compression

Add code
Jul 26, 2025
Viaarxiv icon

HybridServe: Efficient Serving of Large AI Models with Confidence-Based Cascade Routing

Add code
May 18, 2025
Viaarxiv icon

MoE-CAP: Benchmarking Cost, Accuracy and Performance of Sparse Mixture-of-Experts Systems

Add code
May 16, 2025
Viaarxiv icon

MoE-CAP: Cost-Accuracy-Performance Benchmarking for Mixture-of-Experts Systems

Add code
Dec 10, 2024
Viaarxiv icon

Dynamic Self-Distillation via Previous Mini-batches for Fine-tuning Small Language Models

Add code
Nov 25, 2024
Figure 1 for Dynamic Self-Distillation via Previous Mini-batches for Fine-tuning Small Language Models
Figure 2 for Dynamic Self-Distillation via Previous Mini-batches for Fine-tuning Small Language Models
Figure 3 for Dynamic Self-Distillation via Previous Mini-batches for Fine-tuning Small Language Models
Figure 4 for Dynamic Self-Distillation via Previous Mini-batches for Fine-tuning Small Language Models
Viaarxiv icon

Interactive and Expressive Code-Augmented Planning with Large Language Models

Add code
Nov 21, 2024
Viaarxiv icon

DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads

Add code
Oct 14, 2024
Figure 1 for DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads
Figure 2 for DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads
Figure 3 for DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads
Figure 4 for DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads
Viaarxiv icon

ProTrain: Efficient LLM Training via Memory-Aware Techniques

Add code
Jun 12, 2024
Figure 1 for ProTrain: Efficient LLM Training via Memory-Aware Techniques
Figure 2 for ProTrain: Efficient LLM Training via Memory-Aware Techniques
Figure 3 for ProTrain: Efficient LLM Training via Memory-Aware Techniques
Figure 4 for ProTrain: Efficient LLM Training via Memory-Aware Techniques
Viaarxiv icon

Challenges in Deploying Long-Context Transformers: A Theoretical Peak Performance Analysis

Add code
May 14, 2024
Viaarxiv icon