Alert button
Picture for Baris Kasikci

Baris Kasikci

Alert button

Fiddler: CPU-GPU Orchestration for Fast Inference of Mixture-of-Experts Models

Add code
Bookmark button
Alert button
Feb 10, 2024
Keisuke Kamahori, Yile Gu, Kan Zhu, Baris Kasikci

Viaarxiv icon

Atom: Low-bit Quantization for Efficient and Accurate LLM Serving

Add code
Bookmark button
Alert button
Nov 07, 2023
Yilong Zhao, Chien-Yu Lin, Kan Zhu, Zihao Ye, Lequn Chen, Size Zheng, Luis Ceze, Arvind Krishnamurthy, Tianqi Chen, Baris Kasikci

Figure 1 for Atom: Low-bit Quantization for Efficient and Accurate LLM Serving
Figure 2 for Atom: Low-bit Quantization for Efficient and Accurate LLM Serving
Figure 3 for Atom: Low-bit Quantization for Efficient and Accurate LLM Serving
Figure 4 for Atom: Low-bit Quantization for Efficient and Accurate LLM Serving
Viaarxiv icon