Picture for Souvik Kundu

Souvik Kundu

Callie

Sparrow: Sparse Rollout for Stable and Efficient Long-context RL of Large Language Models

Add code
Jun 07, 2026
Viaarxiv icon

MOSAIC: Efficient Mixture-of-Agent Scheduling via Adaptive Aggregation and Inference Concurrency

Add code
Jun 02, 2026
Viaarxiv icon

How Far Can Disaggregation Go? A Design-Space Exploration of Attention-FFN Disaggregation for Efficient MoE LLM Serving

Add code
May 27, 2026
Viaarxiv icon

One Token Away from Collapse: The Fragility of Instruction-Tuned Helpfulness

Add code
Apr 14, 2026
Viaarxiv icon

Power-SMC: Low-Latency Sequence-Level Power Sampling for Training-Free LLM Reasoning

Add code
Feb 10, 2026
Viaarxiv icon

COBRA: Catastrophic Bit-flip Reliability Analysis of State-Space Models

Add code
Dec 22, 2025
Figure 1 for COBRA: Catastrophic Bit-flip Reliability Analysis of State-Space Models
Figure 2 for COBRA: Catastrophic Bit-flip Reliability Analysis of State-Space Models
Figure 3 for COBRA: Catastrophic Bit-flip Reliability Analysis of State-Space Models
Figure 4 for COBRA: Catastrophic Bit-flip Reliability Analysis of State-Space Models
Viaarxiv icon

SkipKV: Selective Skipping of KV Generation and Storage for Efficient Inference with Large Reasoning Models

Add code
Dec 08, 2025
Figure 1 for SkipKV: Selective Skipping of KV Generation and Storage for Efficient Inference with Large Reasoning Models
Figure 2 for SkipKV: Selective Skipping of KV Generation and Storage for Efficient Inference with Large Reasoning Models
Figure 3 for SkipKV: Selective Skipping of KV Generation and Storage for Efficient Inference with Large Reasoning Models
Figure 4 for SkipKV: Selective Skipping of KV Generation and Storage for Efficient Inference with Large Reasoning Models
Viaarxiv icon

RedVTP: Training-Free Acceleration of Diffusion Vision-Language Models Inference via Masked Token-Guided Visual Token Pruning

Add code
Nov 16, 2025
Figure 1 for RedVTP: Training-Free Acceleration of Diffusion Vision-Language Models Inference via Masked Token-Guided Visual Token Pruning
Figure 2 for RedVTP: Training-Free Acceleration of Diffusion Vision-Language Models Inference via Masked Token-Guided Visual Token Pruning
Figure 3 for RedVTP: Training-Free Acceleration of Diffusion Vision-Language Models Inference via Masked Token-Guided Visual Token Pruning
Figure 4 for RedVTP: Training-Free Acceleration of Diffusion Vision-Language Models Inference via Masked Token-Guided Visual Token Pruning
Viaarxiv icon

On-the-Fly Adaptive Distillation of Transformer to Dual-State Linear Attention

Add code
Jun 12, 2025
Viaarxiv icon

Fast and Cost-effective Speculative Edge-Cloud Decoding with Early Exits

Add code
May 27, 2025
Viaarxiv icon