Picture for Ramani Duraiswami

Ramani Duraiswami

Video-Robin: Autoregressive Diffusion Planning for Intent-Grounded Video-to-Music Generation

Add code
Apr 19, 2026
Viaarxiv icon

Audio Flamingo Next: Next-Generation Open Audio-Language Models for Speech, Sound, and Music

Add code
Apr 13, 2026
Viaarxiv icon

On The Application of Linear Attention in Multimodal Transformers

Add code
Apr 11, 2026
Viaarxiv icon

MMOU: A Massive Multi-Task Omni Understanding and Reasoning Benchmark for Long and Complex Real-World Videos

Add code
Mar 14, 2026
Viaarxiv icon

Music Flamingo: Scaling Music Understanding in Audio Language Models

Add code
Nov 13, 2025
Viaarxiv icon

SPUR: A Plug-and-Play Framework for Integrating Spatial Audio Understanding and Reasoning into Large Audio-Language Models

Add code
Nov 13, 2025
Figure 1 for SPUR: A Plug-and-Play Framework for Integrating Spatial Audio Understanding and Reasoning into Large Audio-Language Models
Figure 2 for SPUR: A Plug-and-Play Framework for Integrating Spatial Audio Understanding and Reasoning into Large Audio-Language Models
Figure 3 for SPUR: A Plug-and-Play Framework for Integrating Spatial Audio Understanding and Reasoning into Large Audio-Language Models
Figure 4 for SPUR: A Plug-and-Play Framework for Integrating Spatial Audio Understanding and Reasoning into Large Audio-Language Models
Viaarxiv icon

AURA: A Fine-Grained Benchmark and Decomposed Metric for Audio-Visual Reasoning

Add code
Aug 10, 2025
Viaarxiv icon

Multi-Domain Audio Question Answering Toward Acoustic Content Reasoning in The DCASE 2025 Challenge

Add code
May 12, 2025
Viaarxiv icon

ProSE: Diffusion Priors for Speech Enhancement

Add code
Mar 09, 2025
Figure 1 for ProSE: Diffusion Priors for Speech Enhancement
Figure 2 for ProSE: Diffusion Priors for Speech Enhancement
Figure 3 for ProSE: Diffusion Priors for Speech Enhancement
Figure 4 for ProSE: Diffusion Priors for Speech Enhancement
Viaarxiv icon

Exploiting Sparsity for Long Context Inference: Million Token Contexts on Commodity GPUs

Add code
Feb 10, 2025
Figure 1 for Exploiting Sparsity for Long Context Inference: Million Token Contexts on Commodity GPUs
Figure 2 for Exploiting Sparsity for Long Context Inference: Million Token Contexts on Commodity GPUs
Figure 3 for Exploiting Sparsity for Long Context Inference: Million Token Contexts on Commodity GPUs
Figure 4 for Exploiting Sparsity for Long Context Inference: Million Token Contexts on Commodity GPUs
Viaarxiv icon