Alert button

Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads

Add code
Bookmark button
Alert button
Jan 19, 2024
Tianle Cai, Yuhong Li, Zhengyang Geng, Hongwu Peng, Jason D. Lee, Deming Chen, Tri Dao

Share this with someone who'll enjoy it:

View paper onarxiv icon

Share this with someone who'll enjoy it: