Picture for Manan Suri

Manan Suri

Video2LoRA: Parametric Video Internalization for Vision-Language Models

Add code
Jun 03, 2026
Viaarxiv icon

DRAGON: A Benchmark for Evidence-Grounded Visual Reasoning over Diagrams

Add code
Apr 28, 2026
Viaarxiv icon

Learning Illumination Control in Diffusion Models

Add code
Apr 27, 2026
Viaarxiv icon

Structured Uncertainty guided Clarification for LLM Agents

Add code
Nov 11, 2025
Viaarxiv icon

ChartLens: Fine-grained Visual Attribution in Charts

Add code
May 25, 2025
Figure 1 for ChartLens: Fine-grained Visual Attribution in Charts
Figure 2 for ChartLens: Fine-grained Visual Attribution in Charts
Figure 3 for ChartLens: Fine-grained Visual Attribution in Charts
Figure 4 for ChartLens: Fine-grained Visual Attribution in Charts
Viaarxiv icon

Mitigating Memorization in LLMs using Activation Steering

Add code
Mar 08, 2025
Figure 1 for Mitigating Memorization in LLMs using Activation Steering
Figure 2 for Mitigating Memorization in LLMs using Activation Steering
Figure 3 for Mitigating Memorization in LLMs using Activation Steering
Figure 4 for Mitigating Memorization in LLMs using Activation Steering
Viaarxiv icon

VisDoM: Multi-Document QA with Visually Rich Elements Using Multimodal Retrieval-Augmented Generation

Add code
Dec 14, 2024
Figure 1 for VisDoM: Multi-Document QA with Visually Rich Elements Using Multimodal Retrieval-Augmented Generation
Figure 2 for VisDoM: Multi-Document QA with Visually Rich Elements Using Multimodal Retrieval-Augmented Generation
Figure 3 for VisDoM: Multi-Document QA with Visually Rich Elements Using Multimodal Retrieval-Augmented Generation
Figure 4 for VisDoM: Multi-Document QA with Visually Rich Elements Using Multimodal Retrieval-Augmented Generation
Viaarxiv icon

x-RAGE: eXtended Reality -- Action & Gesture Events Dataset

Add code
Oct 25, 2024
Figure 1 for x-RAGE: eXtended Reality -- Action & Gesture Events Dataset
Figure 2 for x-RAGE: eXtended Reality -- Action & Gesture Events Dataset
Figure 3 for x-RAGE: eXtended Reality -- Action & Gesture Events Dataset
Figure 4 for x-RAGE: eXtended Reality -- Action & Gesture Events Dataset
Viaarxiv icon

DocEdit-v2: Document Structure Editing Via Multimodal LLM Grounding

Add code
Oct 21, 2024
Figure 1 for DocEdit-v2: Document Structure Editing Via Multimodal LLM Grounding
Figure 2 for DocEdit-v2: Document Structure Editing Via Multimodal LLM Grounding
Figure 3 for DocEdit-v2: Document Structure Editing Via Multimodal LLM Grounding
Figure 4 for DocEdit-v2: Document Structure Editing Via Multimodal LLM Grounding
Viaarxiv icon

Non-Invasive Qualitative Vibration Analysis using Event Camera

Add code
Oct 18, 2024
Figure 1 for Non-Invasive Qualitative Vibration Analysis using Event Camera
Figure 2 for Non-Invasive Qualitative Vibration Analysis using Event Camera
Figure 3 for Non-Invasive Qualitative Vibration Analysis using Event Camera
Figure 4 for Non-Invasive Qualitative Vibration Analysis using Event Camera
Viaarxiv icon