Picture for Zhihao Jia

Zhihao Jia

FastKernels: Benchmarking GPU Kernel Generation in Production

Add code
May 22, 2026
Viaarxiv icon

Coral: Cost-Efficient Multi-LLM Serving over Heterogeneous Cloud GPUs

Add code
May 05, 2026
Viaarxiv icon

DLink: Distilling Layer-wise and Dominant Knowledge from EEG Foundation Models

Add code
Apr 16, 2026
Viaarxiv icon

Prism: Symbolic Superoptimization of Tensor Programs

Add code
Apr 16, 2026
Viaarxiv icon

Event Tensor: A Unified Abstraction for Compiling Dynamic Megakernel

Add code
Apr 14, 2026
Viaarxiv icon

Mirage Persistent Kernel: A Compiler and Runtime for Mega-Kernelizing Tensor Programs

Add code
Dec 22, 2025
Viaarxiv icon

Less Is More: Training-Free Sparse Attention with Global Locality for Efficient Reasoning

Add code
Aug 09, 2025
Viaarxiv icon

Nexus: Taming Throughput-Latency Tradeoff in LLM Serving via Efficient GPU Sharing

Add code
Jul 09, 2025
Figure 1 for Nexus: Taming Throughput-Latency Tradeoff in LLM Serving via Efficient GPU Sharing
Figure 2 for Nexus: Taming Throughput-Latency Tradeoff in LLM Serving via Efficient GPU Sharing
Figure 3 for Nexus: Taming Throughput-Latency Tradeoff in LLM Serving via Efficient GPU Sharing
Figure 4 for Nexus: Taming Throughput-Latency Tradeoff in LLM Serving via Efficient GPU Sharing
Viaarxiv icon

DDO: Dual-Decision Optimization via Multi-Agent Collaboration for LLM-Based Medical Consultation

Add code
May 24, 2025
Figure 1 for DDO: Dual-Decision Optimization via Multi-Agent Collaboration for LLM-Based Medical Consultation
Figure 2 for DDO: Dual-Decision Optimization via Multi-Agent Collaboration for LLM-Based Medical Consultation
Figure 3 for DDO: Dual-Decision Optimization via Multi-Agent Collaboration for LLM-Based Medical Consultation
Figure 4 for DDO: Dual-Decision Optimization via Multi-Agent Collaboration for LLM-Based Medical Consultation
Viaarxiv icon

SpecReason: Fast and Accurate Inference-Time Compute via Speculative Reasoning

Add code
Apr 10, 2025
Figure 1 for SpecReason: Fast and Accurate Inference-Time Compute via Speculative Reasoning
Figure 2 for SpecReason: Fast and Accurate Inference-Time Compute via Speculative Reasoning
Figure 3 for SpecReason: Fast and Accurate Inference-Time Compute via Speculative Reasoning
Figure 4 for SpecReason: Fast and Accurate Inference-Time Compute via Speculative Reasoning
Viaarxiv icon