Alert button
Picture for Nikhil Bhendawade

Nikhil Bhendawade

Alert button

Speculative Streaming: Fast LLM Inference without Auxiliary Models

Add code
Bookmark button
Alert button
Feb 16, 2024
Nikhil Bhendawade, Irina Belousova, Qichen Fu, Henry Mason, Mohammad Rastegari, Mahyar Najibi

Viaarxiv icon

FastSeq: Make Sequence Generation Faster

Add code
Bookmark button
Alert button
Jun 08, 2021
Yu Yan, Fei Hu, Jiusheng Chen, Nikhil Bhendawade, Ting Ye, Yeyun Gong, Nan Duan, Desheng Cui, Bingyu Chi, Ruifei Zhang

Figure 1 for FastSeq: Make Sequence Generation Faster
Figure 2 for FastSeq: Make Sequence Generation Faster
Figure 3 for FastSeq: Make Sequence Generation Faster
Figure 4 for FastSeq: Make Sequence Generation Faster
Viaarxiv icon

EL-Attention: Memory Efficient Lossless Attention for Generation

Add code
Bookmark button
Alert button
May 11, 2021
Yu Yan, Jiusheng Chen, Weizhen Qi, Nikhil Bhendawade, Yeyun Gong, Nan Duan, Ruofei Zhang

Figure 1 for EL-Attention: Memory Efficient Lossless Attention for Generation
Figure 2 for EL-Attention: Memory Efficient Lossless Attention for Generation
Figure 3 for EL-Attention: Memory Efficient Lossless Attention for Generation
Figure 4 for EL-Attention: Memory Efficient Lossless Attention for Generation
Viaarxiv icon