Picture for Haichen Zhang

Haichen Zhang

Dual-Pool Token-Budget Routing for Cost-Efficient and Reliable LLM Serving

Add code
Apr 09, 2026
Viaarxiv icon

Knowledge Access Beats Model Size: Memory Augmented Routing for Persistent AI Agents

Add code
Mar 24, 2026
Viaarxiv icon

Visual Confused Deputy: Exploiting and Defending Perception Failures in Computer-Using Agents

Add code
Mar 16, 2026
Viaarxiv icon

98$\times$ Faster LLM Routing Without a Dedicated GPU: Flash Attention, Prompt Compression, and Near-Streaming for the vLLM Semantic Router

Add code
Mar 13, 2026
Viaarxiv icon

Adaptive Vision-Language Model Routing for Computer Use Agents

Add code
Mar 13, 2026
Viaarxiv icon