Alert button
Picture for Zihao Ye

Zihao Ye

Alert button

University of Washington

Atom: Low-bit Quantization for Efficient and Accurate LLM Serving

Add code
Bookmark button
Alert button
Nov 07, 2023
Yilong Zhao, Chien-Yu Lin, Kan Zhu, Zihao Ye, Lequn Chen, Size Zheng, Luis Ceze, Arvind Krishnamurthy, Tianqi Chen, Baris Kasikci

Figure 1 for Atom: Low-bit Quantization for Efficient and Accurate LLM Serving
Figure 2 for Atom: Low-bit Quantization for Efficient and Accurate LLM Serving
Figure 3 for Atom: Low-bit Quantization for Efficient and Accurate LLM Serving
Figure 4 for Atom: Low-bit Quantization for Efficient and Accurate LLM Serving
Viaarxiv icon

Relax: Composable Abstractions for End-to-End Dynamic Machine Learning

Add code
Bookmark button
Alert button
Nov 01, 2023
Ruihang Lai, Junru Shao, Siyuan Feng, Steven S. Lyubomirsky, Bohan Hou, Wuwei Lin, Zihao Ye, Hongyi Jin, Yuchen Jin, Jiawei Liu, Lesheng Jin, Yaxing Cai, Ziheng Jiang, Yong Wu, Sunghyun Park, Prakalp Srivastava, Jared G. Roesch, Todd C. Mowry, Tianqi Chen

Viaarxiv icon

Punica: Multi-Tenant LoRA Serving

Add code
Bookmark button
Alert button
Oct 28, 2023
Lequn Chen, Zihao Ye, Yongji Wu, Danyang Zhuo, Luis Ceze, Arvind Krishnamurthy

Figure 1 for Punica: Multi-Tenant LoRA Serving
Figure 2 for Punica: Multi-Tenant LoRA Serving
Figure 3 for Punica: Multi-Tenant LoRA Serving
Figure 4 for Punica: Multi-Tenant LoRA Serving
Viaarxiv icon

SparseTIR: Composable Abstractions for Sparse Compilation in Deep Learning

Add code
Bookmark button
Alert button
Jul 11, 2022
Zihao Ye, Ruihang Lai, Junru Shao, Tianqi Chen, Luis Ceze

Figure 1 for SparseTIR: Composable Abstractions for Sparse Compilation in Deep Learning
Figure 2 for SparseTIR: Composable Abstractions for Sparse Compilation in Deep Learning
Figure 3 for SparseTIR: Composable Abstractions for Sparse Compilation in Deep Learning
Figure 4 for SparseTIR: Composable Abstractions for Sparse Compilation in Deep Learning
Viaarxiv icon

TensorIR: An Abstraction for Automatic Tensorized Program Optimization

Add code
Bookmark button
Alert button
Jul 09, 2022
Siyuan Feng, Bohan Hou, Hongyi Jin, Wuwei Lin, Junru Shao, Ruihang Lai, Zihao Ye, Lianmin Zheng, Cody Hao Yu, Yong Yu, Tianqi Chen

Figure 1 for TensorIR: An Abstraction for Automatic Tensorized Program Optimization
Figure 2 for TensorIR: An Abstraction for Automatic Tensorized Program Optimization
Figure 3 for TensorIR: An Abstraction for Automatic Tensorized Program Optimization
Figure 4 for TensorIR: An Abstraction for Automatic Tensorized Program Optimization
Viaarxiv icon

FeatGraph: A Flexible and Efficient Backend for Graph Neural Network Systems

Add code
Bookmark button
Alert button
Sep 29, 2020
Yuwei Hu, Zihao Ye, Minjie Wang, Jiali Yu, Da Zheng, Mu Li, Zheng Zhang, Zhiru Zhang, Yida Wang

Figure 1 for FeatGraph: A Flexible and Efficient Backend for Graph Neural Network Systems
Figure 2 for FeatGraph: A Flexible and Efficient Backend for Graph Neural Network Systems
Figure 3 for FeatGraph: A Flexible and Efficient Backend for Graph Neural Network Systems
Figure 4 for FeatGraph: A Flexible and Efficient Backend for Graph Neural Network Systems
Viaarxiv icon

Transformer on a Diet

Add code
Bookmark button
Alert button
Feb 14, 2020
Chenguang Wang, Zihao Ye, Aston Zhang, Zheng Zhang, Alexander J. Smola

Figure 1 for Transformer on a Diet
Figure 2 for Transformer on a Diet
Figure 3 for Transformer on a Diet
Viaarxiv icon

BP-Transformer: Modelling Long-Range Context via Binary Partitioning

Add code
Bookmark button
Alert button
Nov 11, 2019
Zihao Ye, Qipeng Guo, Quan Gan, Xipeng Qiu, Zheng Zhang

Figure 1 for BP-Transformer: Modelling Long-Range Context via Binary Partitioning
Figure 2 for BP-Transformer: Modelling Long-Range Context via Binary Partitioning
Figure 3 for BP-Transformer: Modelling Long-Range Context via Binary Partitioning
Figure 4 for BP-Transformer: Modelling Long-Range Context via Binary Partitioning
Viaarxiv icon