Picture for Souvik Kundu

Souvik Kundu

Callie

COBRA: Catastrophic Bit-flip Reliability Analysis of State-Space Models

Add code
Dec 22, 2025
Figure 1 for COBRA: Catastrophic Bit-flip Reliability Analysis of State-Space Models
Figure 2 for COBRA: Catastrophic Bit-flip Reliability Analysis of State-Space Models
Figure 3 for COBRA: Catastrophic Bit-flip Reliability Analysis of State-Space Models
Figure 4 for COBRA: Catastrophic Bit-flip Reliability Analysis of State-Space Models
Viaarxiv icon

SkipKV: Selective Skipping of KV Generation and Storage for Efficient Inference with Large Reasoning Models

Add code
Dec 08, 2025
Figure 1 for SkipKV: Selective Skipping of KV Generation and Storage for Efficient Inference with Large Reasoning Models
Figure 2 for SkipKV: Selective Skipping of KV Generation and Storage for Efficient Inference with Large Reasoning Models
Figure 3 for SkipKV: Selective Skipping of KV Generation and Storage for Efficient Inference with Large Reasoning Models
Figure 4 for SkipKV: Selective Skipping of KV Generation and Storage for Efficient Inference with Large Reasoning Models
Viaarxiv icon

RedVTP: Training-Free Acceleration of Diffusion Vision-Language Models Inference via Masked Token-Guided Visual Token Pruning

Add code
Nov 16, 2025
Figure 1 for RedVTP: Training-Free Acceleration of Diffusion Vision-Language Models Inference via Masked Token-Guided Visual Token Pruning
Figure 2 for RedVTP: Training-Free Acceleration of Diffusion Vision-Language Models Inference via Masked Token-Guided Visual Token Pruning
Figure 3 for RedVTP: Training-Free Acceleration of Diffusion Vision-Language Models Inference via Masked Token-Guided Visual Token Pruning
Figure 4 for RedVTP: Training-Free Acceleration of Diffusion Vision-Language Models Inference via Masked Token-Guided Visual Token Pruning
Viaarxiv icon

On-the-Fly Adaptive Distillation of Transformer to Dual-State Linear Attention

Add code
Jun 12, 2025
Viaarxiv icon

Fast and Cost-effective Speculative Edge-Cloud Decoding with Early Exits

Add code
May 27, 2025
Viaarxiv icon

Mitigating Hallucinations in Vision-Language Models through Image-Guided Head Suppression

Add code
May 22, 2025
Viaarxiv icon

Accelerating LLM Inference with Flexible N:M Sparsity via A Fully Digital Compute-in-Memory Accelerator

Add code
Apr 19, 2025
Figure 1 for Accelerating LLM Inference with Flexible N:M Sparsity via A Fully Digital Compute-in-Memory Accelerator
Figure 2 for Accelerating LLM Inference with Flexible N:M Sparsity via A Fully Digital Compute-in-Memory Accelerator
Figure 3 for Accelerating LLM Inference with Flexible N:M Sparsity via A Fully Digital Compute-in-Memory Accelerator
Figure 4 for Accelerating LLM Inference with Flexible N:M Sparsity via A Fully Digital Compute-in-Memory Accelerator
Viaarxiv icon

Understanding and Optimizing Multi-Stage AI Inference Pipelines

Add code
Apr 16, 2025
Viaarxiv icon

OuroMamba: A Data-Free Quantization Framework for Vision Mamba Models

Add code
Mar 13, 2025
Viaarxiv icon

Enhancing Large Language Models for Hardware Verification: A Novel SystemVerilog Assertion Dataset

Add code
Mar 11, 2025
Figure 1 for Enhancing Large Language Models for Hardware Verification: A Novel SystemVerilog Assertion Dataset
Figure 2 for Enhancing Large Language Models for Hardware Verification: A Novel SystemVerilog Assertion Dataset
Figure 3 for Enhancing Large Language Models for Hardware Verification: A Novel SystemVerilog Assertion Dataset
Figure 4 for Enhancing Large Language Models for Hardware Verification: A Novel SystemVerilog Assertion Dataset
Viaarxiv icon