Alert button
Picture for Yilong Zhao

Yilong Zhao

Alert button

Atom: Low-bit Quantization for Efficient and Accurate LLM Serving

Nov 07, 2023
Yilong Zhao, Chien-Yu Lin, Kan Zhu, Zihao Ye, Lequn Chen, Size Zheng, Luis Ceze, Arvind Krishnamurthy, Tianqi Chen, Baris Kasikci

Figure 1 for Atom: Low-bit Quantization for Efficient and Accurate LLM Serving
Figure 2 for Atom: Low-bit Quantization for Efficient and Accurate LLM Serving
Figure 3 for Atom: Low-bit Quantization for Efficient and Accurate LLM Serving
Figure 4 for Atom: Low-bit Quantization for Efficient and Accurate LLM Serving
Viaarxiv icon

Neural-PIM: Efficient Processing-In-Memory with Neural Approximation of Peripherals

Jan 30, 2022
Weidong Cao, Yilong Zhao, Adith Boloor, Yinhe Han, Xuan Zhang, Li Jiang

Figure 1 for Neural-PIM: Efficient Processing-In-Memory with Neural Approximation of Peripherals
Figure 2 for Neural-PIM: Efficient Processing-In-Memory with Neural Approximation of Peripherals
Figure 3 for Neural-PIM: Efficient Processing-In-Memory with Neural Approximation of Peripherals
Figure 4 for Neural-PIM: Efficient Processing-In-Memory with Neural Approximation of Peripherals
Viaarxiv icon

SME: ReRAM-based Sparse-Multiplication-Engine to Squeeze-Out Bit Sparsity of Neural Network

Mar 02, 2021
Fangxin Liu, Wenbo Zhao, Yilong Zhao, Zongwu Wang, Tao Yang, Zhezhi He, Naifeng Jing, Xiaoyao Liang, Li Jiang

Figure 1 for SME: ReRAM-based Sparse-Multiplication-Engine to Squeeze-Out Bit Sparsity of Neural Network
Figure 2 for SME: ReRAM-based Sparse-Multiplication-Engine to Squeeze-Out Bit Sparsity of Neural Network
Figure 3 for SME: ReRAM-based Sparse-Multiplication-Engine to Squeeze-Out Bit Sparsity of Neural Network
Figure 4 for SME: ReRAM-based Sparse-Multiplication-Engine to Squeeze-Out Bit Sparsity of Neural Network
Viaarxiv icon

An Ultra-Efficient Memristor-Based DNN Framework with Structured Weight Pruning and Quantization Using ADMM

Aug 29, 2019
Geng Yuan, Xiaolong Ma, Caiwen Ding, Sheng Lin, Tianyun Zhang, Zeinab S. Jalali, Yilong Zhao, Li Jiang, Sucheta Soundarajan, Yanzhi Wang

Figure 1 for An Ultra-Efficient Memristor-Based DNN Framework with Structured Weight Pruning and Quantization Using ADMM
Figure 2 for An Ultra-Efficient Memristor-Based DNN Framework with Structured Weight Pruning and Quantization Using ADMM
Figure 3 for An Ultra-Efficient Memristor-Based DNN Framework with Structured Weight Pruning and Quantization Using ADMM
Figure 4 for An Ultra-Efficient Memristor-Based DNN Framework with Structured Weight Pruning and Quantization Using ADMM
Viaarxiv icon