Picture for Mingu Lee

Mingu Lee

KeyDiff: Key Similarity-Based KV Cache Eviction for Long-Context LLM Inference in Resource-Constrained Environments

Add code
Apr 23, 2025
Viaarxiv icon

KeDiff: Key Similarity-Based KV Cache Eviction for Long-Context LLM Inference in Resource-Constrained Environments

Add code
Apr 21, 2025
Viaarxiv icon

CAOTE: KV Caching through Attention Output Error based Token Eviction

Add code
Apr 18, 2025
Viaarxiv icon

AdaEDL: Early Draft Stopping for Speculative Decoding of Large Language Models via an Entropy-based Lower Bound on Token Acceptance Probability

Add code
Oct 24, 2024
Figure 1 for AdaEDL: Early Draft Stopping for Speculative Decoding of Large Language Models via an Entropy-based Lower Bound on Token Acceptance Probability
Figure 2 for AdaEDL: Early Draft Stopping for Speculative Decoding of Large Language Models via an Entropy-based Lower Bound on Token Acceptance Probability
Figure 3 for AdaEDL: Early Draft Stopping for Speculative Decoding of Large Language Models via an Entropy-based Lower Bound on Token Acceptance Probability
Figure 4 for AdaEDL: Early Draft Stopping for Speculative Decoding of Large Language Models via an Entropy-based Lower Bound on Token Acceptance Probability
Viaarxiv icon

Live Fitness Coaching as a Testbed for Situated Interaction

Add code
Jul 11, 2024
Figure 1 for Live Fitness Coaching as a Testbed for Situated Interaction
Figure 2 for Live Fitness Coaching as a Testbed for Situated Interaction
Figure 3 for Live Fitness Coaching as a Testbed for Situated Interaction
Figure 4 for Live Fitness Coaching as a Testbed for Situated Interaction
Viaarxiv icon

ToSA: Token Selective Attention for Efficient Vision Transformers

Add code
Jun 13, 2024
Viaarxiv icon

On Speculative Decoding for Multimodal Large Language Models

Add code
Apr 13, 2024
Figure 1 for On Speculative Decoding for Multimodal Large Language Models
Figure 2 for On Speculative Decoding for Multimodal Large Language Models
Figure 3 for On Speculative Decoding for Multimodal Large Language Models
Figure 4 for On Speculative Decoding for Multimodal Large Language Models
Viaarxiv icon

HyperCLOVA X Technical Report

Add code
Apr 13, 2024
Viaarxiv icon

Direct Alignment of Draft Model for Speculative Decoding with Chat-Fine-Tuned LLMs

Add code
Mar 08, 2024
Viaarxiv icon

Recursive Speculative Decoding: Accelerating LLM Inference via Sampling Without Replacement

Add code
Mar 05, 2024
Figure 1 for Recursive Speculative Decoding: Accelerating LLM Inference via Sampling Without Replacement
Figure 2 for Recursive Speculative Decoding: Accelerating LLM Inference via Sampling Without Replacement
Figure 3 for Recursive Speculative Decoding: Accelerating LLM Inference via Sampling Without Replacement
Figure 4 for Recursive Speculative Decoding: Accelerating LLM Inference via Sampling Without Replacement
Viaarxiv icon