Zero Shot Audio Captioning


MAGIC-Enhanced Keyword Prompting for Zero-Shot Audio Captioning with CLIP Models

Add code
Sep 16, 2025
Viaarxiv icon

Jamendo-QA: A Large-Scale Music Question Answering Dataset

Add code
Sep 19, 2025
Viaarxiv icon

RFM-Editing: Rectified Flow Matching for Text-guided Audio Editing

Add code
Sep 17, 2025
Viaarxiv icon

VeS: Teaching Pixels to Listen Without Supervision

Add code
Jul 29, 2025
Viaarxiv icon

ARC-Hunyuan-Video-7B: Structured Video Comprehension of Real-World Shorts

Add code
Jul 28, 2025
Viaarxiv icon

AC/DC: LLM-based Audio Comprehension via Dialogue Continuation

Add code
Jun 12, 2025
Viaarxiv icon

TACOS: Temporally-aligned Audio CaptiOnS for Language-Audio Pretraining

Add code
May 12, 2025
Viaarxiv icon

AudSemThinker: Enhancing Audio-Language Models through Reasoning over Semantics of Sound

Add code
May 20, 2025
Viaarxiv icon

Sat2Sound: A Unified Framework for Zero-Shot Soundscape Mapping

Add code
May 19, 2025
Viaarxiv icon

FocusedAD: Character-centric Movie Audio Description

Add code
Apr 16, 2025
Figure 1 for FocusedAD: Character-centric Movie Audio Description
Figure 2 for FocusedAD: Character-centric Movie Audio Description
Figure 3 for FocusedAD: Character-centric Movie Audio Description
Figure 4 for FocusedAD: Character-centric Movie Audio Description
Viaarxiv icon