Picture for Hai Zhao

Hai Zhao

Department of Computer Science and Engineering, Shanghai Jiao Tong University, Key Laboratory of Shanghai Education Commission for Intelligent Interaction and Cognitive Engineering, Shanghai Jiao Tong University, MoE Key Lab of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University

RACER: Retrieval-Augmented Contextual Rapid Speculative Decoding

Add code
Apr 16, 2026
Viaarxiv icon

TrigReason: Trigger-Based Collaboration between Small and Large Reasoning Models

Add code
Apr 16, 2026
Viaarxiv icon

Remember Me, Refine Me: A Dynamic Procedural Memory Framework for Experience-Driven Agent Evolution

Add code
Dec 11, 2025
Viaarxiv icon

CoViPAL: Layer-wise Contextualized Visual Token Pruning for Large Vision-Language Models

Add code
Aug 24, 2025
Viaarxiv icon

DAC: A Dynamic Attention-aware Approach for Task-Agnostic Prompt Compression

Add code
Jul 16, 2025
Viaarxiv icon

IAM: Efficient Inference through Attention Mapping between Different-scale LLMs

Add code
Jul 16, 2025
Viaarxiv icon

Plan Your Travel and Travel with Your Plan: Wide-Horizon Planning and Evaluation via LLM

Add code
Jun 14, 2025
Viaarxiv icon

MaskSearch: A Universal Pre-Training Framework to Enhance Agentic Search Capability

Add code
May 27, 2025
Viaarxiv icon

Segment First or Comprehend First? Explore the Limit of Unsupervised Word Segmentation with Large Language Models

Add code
May 26, 2025
Viaarxiv icon

Faster MoE LLM Inference for Extremely Large Models

Add code
May 06, 2025
Figure 1 for Faster MoE LLM Inference for Extremely Large Models
Figure 2 for Faster MoE LLM Inference for Extremely Large Models
Figure 3 for Faster MoE LLM Inference for Extremely Large Models
Figure 4 for Faster MoE LLM Inference for Extremely Large Models
Viaarxiv icon