Alert button
Picture for Yakun Sophia Shao

Yakun Sophia Shao

Alert button

KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization

Add code
Bookmark button
Alert button
Feb 07, 2024
Coleman Hooper, Sehoon Kim, Hiva Mohammadzadeh, Michael W. Mahoney, Yakun Sophia Shao, Kurt Keutzer, Amir Gholami

Viaarxiv icon

MoCA: Memory-Centric, Adaptive Execution for Multi-Tenant Deep Neural Networks

Add code
Bookmark button
Alert button
May 10, 2023
Seah Kim, Hasan Genc, Vadim Vadimovich Nikiforov, Krste Asanović, Borivoje Nikolić, Yakun Sophia Shao

Figure 1 for MoCA: Memory-Centric, Adaptive Execution for Multi-Tenant Deep Neural Networks
Figure 2 for MoCA: Memory-Centric, Adaptive Execution for Multi-Tenant Deep Neural Networks
Figure 3 for MoCA: Memory-Centric, Adaptive Execution for Multi-Tenant Deep Neural Networks
Figure 4 for MoCA: Memory-Centric, Adaptive Execution for Multi-Tenant Deep Neural Networks
Viaarxiv icon

Full Stack Optimization of Transformer Inference: a Survey

Add code
Bookmark button
Alert button
Feb 27, 2023
Sehoon Kim, Coleman Hooper, Thanakul Wattanawong, Minwoo Kang, Ruohan Yan, Hasan Genc, Grace Dinh, Qijing Huang, Kurt Keutzer, Michael W. Mahoney, Yakun Sophia Shao, Amir Gholami

Figure 1 for Full Stack Optimization of Transformer Inference: a Survey
Figure 2 for Full Stack Optimization of Transformer Inference: a Survey
Figure 3 for Full Stack Optimization of Transformer Inference: a Survey
Figure 4 for Full Stack Optimization of Transformer Inference: a Survey
Viaarxiv icon

CoSA: Scheduling by Constrained Optimization for Spatial Accelerators

Add code
Bookmark button
Alert button
May 05, 2021
Qijing Huang, Minwoo Kang, Grace Dinh, Thomas Norell, Aravind Kalaiah, James Demmel, John Wawrzynek, Yakun Sophia Shao

Figure 1 for CoSA: Scheduling by Constrained Optimization for Spatial Accelerators
Figure 2 for CoSA: Scheduling by Constrained Optimization for Spatial Accelerators
Figure 3 for CoSA: Scheduling by Constrained Optimization for Spatial Accelerators
Figure 4 for CoSA: Scheduling by Constrained Optimization for Spatial Accelerators
Viaarxiv icon

Efficient emotion recognition using hyperdimensional computing with combinatorial channel encoding and cellular automata

Add code
Bookmark button
Alert button
Apr 06, 2021
Alisha Menon, Anirudh Natarajan, Reva Agashe, Daniel Sun, Melvin Aristio, Harrison Liew, Yakun Sophia Shao, Jan M. Rabaey

Figure 1 for Efficient emotion recognition using hyperdimensional computing with combinatorial channel encoding and cellular automata
Figure 2 for Efficient emotion recognition using hyperdimensional computing with combinatorial channel encoding and cellular automata
Figure 3 for Efficient emotion recognition using hyperdimensional computing with combinatorial channel encoding and cellular automata
Figure 4 for Efficient emotion recognition using hyperdimensional computing with combinatorial channel encoding and cellular automata
Viaarxiv icon

Gemmini: An Agile Systolic Array Generator Enabling Systematic Evaluations of Deep-Learning Architectures

Add code
Bookmark button
Alert button
Dec 07, 2019
Hasan Genc, Ameer Haj-Ali, Vighnesh Iyer, Alon Amid, Howard Mao, John Wright, Colin Schmidt, Jerry Zhao, Albert Ou, Max Banister, Yakun Sophia Shao, Borivoje Nikolic, Ion Stoica, Krste Asanovic

Figure 1 for Gemmini: An Agile Systolic Array Generator Enabling Systematic Evaluations of Deep-Learning Architectures
Figure 2 for Gemmini: An Agile Systolic Array Generator Enabling Systematic Evaluations of Deep-Learning Architectures
Figure 3 for Gemmini: An Agile Systolic Array Generator Enabling Systematic Evaluations of Deep-Learning Architectures
Figure 4 for Gemmini: An Agile Systolic Array Generator Enabling Systematic Evaluations of Deep-Learning Architectures
Viaarxiv icon