Picture for Zhenhua Han

Zhenhua Han

Agentic Harness Engineering: Observability-Driven Automatic Evolution of Coding-Agent Harnesses

Add code
Apr 28, 2026
Viaarxiv icon

EVPO: Explained Variance Policy Optimization for Adaptive Critic Utilization in LLM Post-Training

Add code
Apr 21, 2026
Viaarxiv icon

MM-Doc-R1: Training Agents for Long Document Visual Question Answering through Multi-turn Reinforcement Learning

Add code
Apr 15, 2026
Viaarxiv icon

Building Self-Evolving Agents via Experience-Driven Lifelong Learning: A Framework and Benchmark

Add code
Aug 26, 2025
Viaarxiv icon

Efficient Serving of LLM Applications with Probabilistic Demand Modeling

Add code
Jun 17, 2025
Viaarxiv icon

Efficient Unified Caching for Accelerating Heterogeneous AI Workloads

Add code
Jun 14, 2025
Viaarxiv icon

Speech-Language Models with Decoupled Tokenizers and Multi-Token Prediction

Add code
Jun 14, 2025
Figure 1 for Speech-Language Models with Decoupled Tokenizers and Multi-Token Prediction
Figure 2 for Speech-Language Models with Decoupled Tokenizers and Multi-Token Prediction
Figure 3 for Speech-Language Models with Decoupled Tokenizers and Multi-Token Prediction
Figure 4 for Speech-Language Models with Decoupled Tokenizers and Multi-Token Prediction
Viaarxiv icon

Real-Time Neural-Enhancement for Online Cloud Gaming

Add code
Jan 12, 2025
Figure 1 for Real-Time Neural-Enhancement for Online Cloud Gaming
Figure 2 for Real-Time Neural-Enhancement for Online Cloud Gaming
Figure 3 for Real-Time Neural-Enhancement for Online Cloud Gaming
Figure 4 for Real-Time Neural-Enhancement for Online Cloud Gaming
Viaarxiv icon

RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval

Add code
Sep 16, 2024
Figure 1 for RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval
Figure 2 for RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval
Figure 3 for RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval
Figure 4 for RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval
Viaarxiv icon

MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention

Add code
Jul 02, 2024
Figure 1 for MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention
Figure 2 for MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention
Figure 3 for MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention
Figure 4 for MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention
Viaarxiv icon