Audio Visual Event Localization


RA-SSU: Towards Fine-Grained Audio-Visual Learning with Region-Aware Sound Source Understanding

Add code
Mar 10, 2026
Viaarxiv icon

Logics-Parsing-Omni Technical Report

Add code
Mar 12, 2026
Viaarxiv icon

WISE: A Multimodal Search Engine for Visual Scenes, Audio, Objects, Faces, Speech, and Metadata

Add code
Feb 13, 2026
Viaarxiv icon

Cross-Modal Binary Attention: An Energy-Efficient Fusion Framework for Audio-Visual Learning

Add code
Jan 31, 2026
Viaarxiv icon

BickGraphing: Web-Based Application for Visual Inspection of Audio Recordings

Add code
Jan 14, 2026
Viaarxiv icon

ToS: A Team of Specialists ensemble framework for Stereo Sound Event Localization and Detection with distance estimation in Video

Add code
Jan 24, 2026
Viaarxiv icon

EchoFoley: Event-Centric Hierarchical Control for Video Grounded Creative Sound Generation

Add code
Dec 31, 2025
Viaarxiv icon

OmniAgent: Audio-Guided Active Perception Agent for Omnimodal Audio-Video Understanding

Add code
Dec 29, 2025
Viaarxiv icon

CLASP: Cross-modal Salient Anchor-based Semantic Propagation for Weakly-supervised Dense Audio-Visual Event Localization

Add code
Aug 06, 2025
Viaarxiv icon

VP-SelDoA: Visual-prompted Selective DoA Estimation of Target Sound via Semantic-Spatial Matching

Add code
Jul 10, 2025
Viaarxiv icon