Picture for Zihao Ye

Zihao Ye

University of Washington

AVO: Agentic Variation Operators for Autonomous Evolutionary Search

Add code
Mar 25, 2026
Viaarxiv icon

SOL-ExecBench: Speed-of-Light Benchmarking for Real-World GPU Kernels Against Hardware Limits

Add code
Mar 19, 2026
Viaarxiv icon

Axe: A Simple Unified Layout Abstraction for Machine Learning Compilers

Add code
Jan 27, 2026
Viaarxiv icon

FlashInfer-Bench: Building the Virtuous Cycle for AI-driven LLM Systems

Add code
Jan 01, 2026
Viaarxiv icon

Mirage Persistent Kernel: A Compiler and Runtime for Mega-Kernelizing Tensor Programs

Add code
Dec 22, 2025
Viaarxiv icon

TeleRAG: Efficient Retrieval-Augmented Generation Inference with Lookahead Retrieval

Add code
Feb 28, 2025
Figure 1 for TeleRAG: Efficient Retrieval-Augmented Generation Inference with Lookahead Retrieval
Figure 2 for TeleRAG: Efficient Retrieval-Augmented Generation Inference with Lookahead Retrieval
Figure 3 for TeleRAG: Efficient Retrieval-Augmented Generation Inference with Lookahead Retrieval
Figure 4 for TeleRAG: Efficient Retrieval-Augmented Generation Inference with Lookahead Retrieval
Viaarxiv icon

FlashInfer: Efficient and Customizable Attention Engine for LLM Inference Serving

Add code
Jan 02, 2025
Figure 1 for FlashInfer: Efficient and Customizable Attention Engine for LLM Inference Serving
Figure 2 for FlashInfer: Efficient and Customizable Attention Engine for LLM Inference Serving
Figure 3 for FlashInfer: Efficient and Customizable Attention Engine for LLM Inference Serving
Figure 4 for FlashInfer: Efficient and Customizable Attention Engine for LLM Inference Serving
Viaarxiv icon

MagicPIG: LSH Sampling for Efficient LLM Generation

Add code
Oct 21, 2024
Figure 1 for MagicPIG: LSH Sampling for Efficient LLM Generation
Figure 2 for MagicPIG: LSH Sampling for Efficient LLM Generation
Figure 3 for MagicPIG: LSH Sampling for Efficient LLM Generation
Figure 4 for MagicPIG: LSH Sampling for Efficient LLM Generation
Viaarxiv icon

Improving Image De-raining Using Reference-Guided Transformers

Add code
Aug 01, 2024
Figure 1 for Improving Image De-raining Using Reference-Guided Transformers
Figure 2 for Improving Image De-raining Using Reference-Guided Transformers
Figure 3 for Improving Image De-raining Using Reference-Guided Transformers
Figure 4 for Improving Image De-raining Using Reference-Guided Transformers
Viaarxiv icon

Atom: Low-bit Quantization for Efficient and Accurate LLM Serving

Add code
Nov 07, 2023
Figure 1 for Atom: Low-bit Quantization for Efficient and Accurate LLM Serving
Figure 2 for Atom: Low-bit Quantization for Efficient and Accurate LLM Serving
Figure 3 for Atom: Low-bit Quantization for Efficient and Accurate LLM Serving
Figure 4 for Atom: Low-bit Quantization for Efficient and Accurate LLM Serving
Viaarxiv icon