Picture for Raghavv Goel

Raghavv Goel

KeyDiff: Key Similarity-Based KV Cache Eviction for Long-Context LLM Inference in Resource-Constrained Environments

Add code
Apr 23, 2025
Viaarxiv icon

KeDiff: Key Similarity-Based KV Cache Eviction for Long-Context LLM Inference in Resource-Constrained Environments

Add code
Apr 21, 2025
Viaarxiv icon

CAOTE: KV Caching through Attention Output Error based Token Eviction

Add code
Apr 18, 2025
Viaarxiv icon

On Speculative Decoding for Multimodal Large Language Models

Add code
Apr 13, 2024
Figure 1 for On Speculative Decoding for Multimodal Large Language Models
Figure 2 for On Speculative Decoding for Multimodal Large Language Models
Figure 3 for On Speculative Decoding for Multimodal Large Language Models
Figure 4 for On Speculative Decoding for Multimodal Large Language Models
Viaarxiv icon

Direct Alignment of Draft Model for Speculative Decoding with Chat-Fine-Tuned LLMs

Add code
Mar 08, 2024
Viaarxiv icon

Recursive Speculative Decoding: Accelerating LLM Inference via Sampling Without Replacement

Add code
Mar 05, 2024
Figure 1 for Recursive Speculative Decoding: Accelerating LLM Inference via Sampling Without Replacement
Figure 2 for Recursive Speculative Decoding: Accelerating LLM Inference via Sampling Without Replacement
Figure 3 for Recursive Speculative Decoding: Accelerating LLM Inference via Sampling Without Replacement
Figure 4 for Recursive Speculative Decoding: Accelerating LLM Inference via Sampling Without Replacement
Viaarxiv icon

Motion Informed Needle Segmentation in Ultrasound Images

Add code
Dec 05, 2023
Viaarxiv icon