Picture for Cunxiao Du

Cunxiao Du

ToolSpec: Accelerating Tool Calling via Schema-Aware and Retrieval-Augmented Speculative Decoding

Add code
Apr 15, 2026
Viaarxiv icon

Annealed Relaxation of Speculative Decoding for Faster Autoregressive Image Generation

Add code
Jan 14, 2026
Viaarxiv icon

Demystifying the Slash Pattern in Attention: The Role of RoPE

Add code
Jan 13, 2026
Viaarxiv icon

Sparse-to-Dense: A Free Lunch for Lossless Acceleration of Video Understanding in LLMs

Add code
May 25, 2025
Viaarxiv icon

BanditSpec: Adaptive Speculative Decoding via Bandit Algorithms

Add code
May 21, 2025
Figure 1 for BanditSpec: Adaptive Speculative Decoding via Bandit Algorithms
Figure 2 for BanditSpec: Adaptive Speculative Decoding via Bandit Algorithms
Figure 3 for BanditSpec: Adaptive Speculative Decoding via Bandit Algorithms
Figure 4 for BanditSpec: Adaptive Speculative Decoding via Bandit Algorithms
Viaarxiv icon

LongSpec: Long-Context Speculative Decoding with Efficient Drafting and Verification

Add code
Feb 24, 2025
Figure 1 for LongSpec: Long-Context Speculative Decoding with Efficient Drafting and Verification
Figure 2 for LongSpec: Long-Context Speculative Decoding with Efficient Drafting and Verification
Figure 3 for LongSpec: Long-Context Speculative Decoding with Efficient Drafting and Verification
Figure 4 for LongSpec: Long-Context Speculative Decoding with Efficient Drafting and Verification
Viaarxiv icon

Sailor2: Sailing in South-East Asia with Inclusive Multilingual LLMs

Add code
Feb 18, 2025
Figure 1 for Sailor2: Sailing in South-East Asia with Inclusive Multilingual LLMs
Figure 2 for Sailor2: Sailing in South-East Asia with Inclusive Multilingual LLMs
Figure 3 for Sailor2: Sailing in South-East Asia with Inclusive Multilingual LLMs
Figure 4 for Sailor2: Sailing in South-East Asia with Inclusive Multilingual LLMs
Viaarxiv icon

When Precision Meets Position: BFloat16 Breaks Down RoPE in Long-Context Training

Add code
Nov 20, 2024
Figure 1 for When Precision Meets Position: BFloat16 Breaks Down RoPE in Long-Context Training
Figure 2 for When Precision Meets Position: BFloat16 Breaks Down RoPE in Long-Context Training
Figure 3 for When Precision Meets Position: BFloat16 Breaks Down RoPE in Long-Context Training
Figure 4 for When Precision Meets Position: BFloat16 Breaks Down RoPE in Long-Context Training
Viaarxiv icon

SimLayerKV: A Simple Framework for Layer-Level KV Cache Reduction

Add code
Oct 17, 2024
Figure 1 for SimLayerKV: A Simple Framework for Layer-Level KV Cache Reduction
Figure 2 for SimLayerKV: A Simple Framework for Layer-Level KV Cache Reduction
Figure 3 for SimLayerKV: A Simple Framework for Layer-Level KV Cache Reduction
Figure 4 for SimLayerKV: A Simple Framework for Layer-Level KV Cache Reduction
Viaarxiv icon

When Attention Sink Emerges in Language Models: An Empirical View

Add code
Oct 14, 2024
Figure 1 for When Attention Sink Emerges in Language Models: An Empirical View
Figure 2 for When Attention Sink Emerges in Language Models: An Empirical View
Figure 3 for When Attention Sink Emerges in Language Models: An Empirical View
Figure 4 for When Attention Sink Emerges in Language Models: An Empirical View
Viaarxiv icon