Picture for Dinesh Manocha

Dinesh Manocha

AMUSE: Audio-Visual Benchmark and Alignment Framework for Agentic Multi-Speaker Understanding

Add code
Dec 18, 2025
Viaarxiv icon

DR. Nav: Semantic-Geometric Representations for Proactive Dead-End Recovery and Navigation

Add code
Nov 16, 2025
Viaarxiv icon

SPUR: A Plug-and-Play Framework for Integrating Spatial Audio Understanding and Reasoning into Large Audio-Language Models

Add code
Nov 13, 2025
Viaarxiv icon

Music Flamingo: Scaling Music Understanding in Audio Language Models

Add code
Nov 13, 2025
Viaarxiv icon

Structured Uncertainty guided Clarification for LLM Agents

Add code
Nov 11, 2025
Viaarxiv icon

MoRe: Monocular Geometry Refinement via Graph Optimization for Cross-View Consistency

Add code
Oct 08, 2025
Figure 1 for MoRe: Monocular Geometry Refinement via Graph Optimization for Cross-View Consistency
Figure 2 for MoRe: Monocular Geometry Refinement via Graph Optimization for Cross-View Consistency
Figure 3 for MoRe: Monocular Geometry Refinement via Graph Optimization for Cross-View Consistency
Figure 4 for MoRe: Monocular Geometry Refinement via Graph Optimization for Cross-View Consistency
Viaarxiv icon

NavMoE: Hybrid Model- and Learning-based Traversability Estimation for Local Navigation via Mixture of Experts

Add code
Sep 16, 2025
Viaarxiv icon

A Survey on Long-Video Storytelling Generation: Architectures, Consistency, and Cinematic Quality

Add code
Jul 09, 2025
Figure 1 for A Survey on Long-Video Storytelling Generation: Architectures, Consistency, and Cinematic Quality
Figure 2 for A Survey on Long-Video Storytelling Generation: Architectures, Consistency, and Cinematic Quality
Viaarxiv icon

MAGNET: A Multi-agent Framework for Finding Audio-Visual Needles by Reasoning over Multi-Video Haystacks

Add code
Jun 08, 2025
Viaarxiv icon

UAV4D: Dynamic Neural Rendering of Human-Centric UAV Imagery using Gaussian Splatting

Add code
Jun 05, 2025
Viaarxiv icon