Picture for Kim Sung-Bin

Kim Sung-Bin

AlignDiT: Multimodal Aligned Diffusion Transformer for Synchronized Speech Generation

Add code
Apr 29, 2025
Viaarxiv icon

VoiceCraft-Dub: Automated Video Dubbing with Neural Codec Language Models

Add code
Apr 03, 2025
Viaarxiv icon

Perceptually Accurate 3D Talking Head Generation: New Definitions, Speech-Mesh Representation, and Evaluation Metrics

Add code
Mar 27, 2025
Viaarxiv icon

SoundBrush: Sound as a Brush for Visual Scene Editing

Add code
Dec 31, 2024
Figure 1 for SoundBrush: Sound as a Brush for Visual Scene Editing
Figure 2 for SoundBrush: Sound as a Brush for Visual Scene Editing
Figure 3 for SoundBrush: Sound as a Brush for Visual Scene Editing
Figure 4 for SoundBrush: Sound as a Brush for Visual Scene Editing
Viaarxiv icon

Sound2Vision: Generating Diverse Visuals from Audio through Cross-Modal Latent Alignment

Add code
Dec 09, 2024
Figure 1 for Sound2Vision: Generating Diverse Visuals from Audio through Cross-Modal Latent Alignment
Figure 2 for Sound2Vision: Generating Diverse Visuals from Audio through Cross-Modal Latent Alignment
Figure 3 for Sound2Vision: Generating Diverse Visuals from Audio through Cross-Modal Latent Alignment
Figure 4 for Sound2Vision: Generating Diverse Visuals from Audio through Cross-Modal Latent Alignment
Viaarxiv icon

AVHBench: A Cross-Modal Hallucination Benchmark for Audio-Visual Large Language Models

Add code
Oct 23, 2024
Figure 1 for AVHBench: A Cross-Modal Hallucination Benchmark for Audio-Visual Large Language Models
Figure 2 for AVHBench: A Cross-Modal Hallucination Benchmark for Audio-Visual Large Language Models
Figure 3 for AVHBench: A Cross-Modal Hallucination Benchmark for Audio-Visual Large Language Models
Figure 4 for AVHBench: A Cross-Modal Hallucination Benchmark for Audio-Visual Large Language Models
Viaarxiv icon

Enhancing Speech-Driven 3D Facial Animation with Audio-Visual Guidance from Lip Reading Expert

Add code
Jul 01, 2024
Viaarxiv icon

MultiTalk: Enhancing 3D Talking Head Generation Across Languages with Multilingual Video Dataset

Add code
Jun 20, 2024
Viaarxiv icon

Revisiting Learning-based Video Motion Magnification for Real-time Processing

Add code
Mar 04, 2024
Figure 1 for Revisiting Learning-based Video Motion Magnification for Real-time Processing
Figure 2 for Revisiting Learning-based Video Motion Magnification for Real-time Processing
Figure 3 for Revisiting Learning-based Video Motion Magnification for Real-time Processing
Figure 4 for Revisiting Learning-based Video Motion Magnification for Real-time Processing
Viaarxiv icon

SMILE: Multimodal Dataset for Understanding Laughter in Video with Language Models

Add code
Dec 15, 2023
Viaarxiv icon