Picture for Sayan Nag

Sayan Nag

MAGNET: A Multi-agent Framework for Finding Audio-Visual Needles by Reasoning over Multi-Video Haystacks

Add code
Jun 08, 2025
Viaarxiv icon

Localizing Knowledge in Diffusion Transformers

Add code
May 24, 2025
Viaarxiv icon

Aurelia: Test-time Reasoning Distillation in Audio-Visual LLMs

Add code
Mar 29, 2025
Viaarxiv icon

AVTrustBench: Assessing and Enhancing Reliability and Robustness in Audio-Visual LLMs

Add code
Jan 03, 2025
Figure 1 for AVTrustBench: Assessing and Enhancing Reliability and Robustness in Audio-Visual LLMs
Figure 2 for AVTrustBench: Assessing and Enhancing Reliability and Robustness in Audio-Visual LLMs
Figure 3 for AVTrustBench: Assessing and Enhancing Reliability and Robustness in Audio-Visual LLMs
Figure 4 for AVTrustBench: Assessing and Enhancing Reliability and Robustness in Audio-Visual LLMs
Viaarxiv icon

SafaRi:Adaptive Sequence Transformer for Weakly Supervised Referring Expression Segmentation

Add code
Jul 02, 2024
Viaarxiv icon

Meerkat: Audio-Visual Large Language Model for Grounding in Space and Time

Add code
Jul 01, 2024
Viaarxiv icon

MeLFusion: Synthesizing Music from Image and Language Cues using Diffusion Models

Add code
Jun 07, 2024
Viaarxiv icon

Jack of All Tasks, Master of Many: Designing General-purpose Coarse-to-Fine Vision-Language Model

Add code
Dec 19, 2023
Viaarxiv icon

APoLLo: Unified Adapter and Prompt Learning for Vision Language Models

Add code
Dec 04, 2023
Viaarxiv icon

EgoVLPv2: Egocentric Video-Language Pre-training with Fusion in the Backbone

Add code
Jul 11, 2023
Viaarxiv icon