Picture for Youngjoon Jang

Youngjoon Jang

Fork-Merge Decoding: Enhancing Multimodal Understanding in Audio-Visual Large Language Models

Add code
May 27, 2025
Viaarxiv icon

AVCD: Mitigating Hallucinations in Audio-Visual Large Language Models through Contrastive Decoding

Add code
May 27, 2025
Viaarxiv icon

Test-Time Augmentation for Pose-invariant Face Recognition

Add code
May 14, 2025
Viaarxiv icon

Deep Understanding of Sign Language for Sign to Subtitle Alignment

Add code
Mar 05, 2025
Viaarxiv icon

Lost in Translation, Found in Context: Sign Language Translation with Contextual Cues

Add code
Jan 16, 2025
Figure 1 for Lost in Translation, Found in Context: Sign Language Translation with Contextual Cues
Figure 2 for Lost in Translation, Found in Context: Sign Language Translation with Contextual Cues
Figure 3 for Lost in Translation, Found in Context: Sign Language Translation with Contextual Cues
Figure 4 for Lost in Translation, Found in Context: Sign Language Translation with Contextual Cues
Viaarxiv icon

VoiceDiT: Dual-Condition Diffusion Transformer for Environment-Aware Speech Synthesis

Add code
Dec 26, 2024
Viaarxiv icon

Let Me Finish My Sentence: Video Temporal Grounding with Holistic Text Understanding

Add code
Oct 17, 2024
Figure 1 for Let Me Finish My Sentence: Video Temporal Grounding with Holistic Text Understanding
Figure 2 for Let Me Finish My Sentence: Video Temporal Grounding with Holistic Text Understanding
Figure 3 for Let Me Finish My Sentence: Video Temporal Grounding with Holistic Text Understanding
Figure 4 for Let Me Finish My Sentence: Video Temporal Grounding with Holistic Text Understanding
Viaarxiv icon

Faces that Speak: Jointly Synthesising Talking Face and Speech from Text

Add code
May 16, 2024
Figure 1 for Faces that Speak: Jointly Synthesising Talking Face and Speech from Text
Figure 2 for Faces that Speak: Jointly Synthesising Talking Face and Speech from Text
Figure 3 for Faces that Speak: Jointly Synthesising Talking Face and Speech from Text
Figure 4 for Faces that Speak: Jointly Synthesising Talking Face and Speech from Text
Viaarxiv icon

FreGrad: Lightweight and Fast Frequency-aware Diffusion Vocoder

Add code
Jan 18, 2024
Viaarxiv icon

Seeing Through the Conversation: Audio-Visual Speech Separation based on Diffusion Model

Add code
Oct 30, 2023
Viaarxiv icon