Picture for Joon Son Chung

Joon Son Chung

Dub-S2ST: Textless Speech-to-Speech Translation for Seamless Dubbing

Add code
May 27, 2025
Viaarxiv icon

AVCD: Mitigating Hallucinations in Audio-Visual Large Language Models through Contrastive Decoding

Add code
May 27, 2025
Viaarxiv icon

Fork-Merge Decoding: Enhancing Multimodal Understanding in Audio-Visual Large Language Models

Add code
May 27, 2025
Viaarxiv icon

Accelerating Diffusion-based Text-to-Speech Model Training with Dual Modality Alignment

Add code
May 26, 2025
Viaarxiv icon

SEED: Speaker Embedding Enhancement Diffusion Model

Add code
May 22, 2025
Viaarxiv icon

Test-Time Augmentation for Pose-invariant Face Recognition

Add code
May 14, 2025
Viaarxiv icon

Hearing and Seeing Through CLIP: A Framework for Self-Supervised Sound Source Localization

Add code
May 08, 2025
Viaarxiv icon

AlignDiT: Multimodal Aligned Diffusion Transformer for Synchronized Speech Generation

Add code
Apr 29, 2025
Viaarxiv icon

VoiceCraft-Dub: Automated Video Dubbing with Neural Codec Language Models

Add code
Apr 03, 2025
Viaarxiv icon

Seeing Speech and Sound: Distinguishing and Locating Audios in Visual Scenes

Add code
Mar 24, 2025
Viaarxiv icon