Picture for Souvik Kundu

Souvik Kundu

Callie

Fast and Cost-effective Speculative Edge-Cloud Decoding with Early Exits

Add code
May 27, 2025
Viaarxiv icon

Mitigating Hallucinations in Vision-Language Models through Image-Guided Head Suppression

Add code
May 22, 2025
Viaarxiv icon

Accelerating LLM Inference with Flexible N:M Sparsity via A Fully Digital Compute-in-Memory Accelerator

Add code
Apr 19, 2025
Viaarxiv icon

Understanding and Optimizing Multi-Stage AI Inference Pipelines

Add code
Apr 16, 2025
Viaarxiv icon

OuroMamba: A Data-Free Quantization Framework for Vision Mamba Models

Add code
Mar 13, 2025
Viaarxiv icon

Enhancing Large Language Models for Hardware Verification: A Novel SystemVerilog Assertion Dataset

Add code
Mar 11, 2025
Viaarxiv icon

LVLM-Compress-Bench: Benchmarking the Broader Impact of Large Vision-Language Model Compression

Add code
Mar 06, 2025
Viaarxiv icon

LANTERN++: Enhanced Relaxed Speculative Decoding with Static Tree Drafting for Visual Auto-regressive Models

Add code
Feb 10, 2025
Figure 1 for LANTERN++: Enhanced Relaxed Speculative Decoding with Static Tree Drafting for Visual Auto-regressive Models
Figure 2 for LANTERN++: Enhanced Relaxed Speculative Decoding with Static Tree Drafting for Visual Auto-regressive Models
Figure 3 for LANTERN++: Enhanced Relaxed Speculative Decoding with Static Tree Drafting for Visual Auto-regressive Models
Figure 4 for LANTERN++: Enhanced Relaxed Speculative Decoding with Static Tree Drafting for Visual Auto-regressive Models
Viaarxiv icon

CITER: Collaborative Inference for Efficient Large Language Model Decoding with Token-Level Routing

Add code
Feb 04, 2025
Viaarxiv icon

Unraveling Zeroth-Order Optimization through the Lens of Low-Dimensional Structured Perturbations

Add code
Jan 31, 2025
Viaarxiv icon