Picture for Jeongsoo Choi

Jeongsoo Choi

DiFlowDubber: Discrete Flow Matching for Automated Video Dubbing via Cross-Modal Alignment and Synchronization

Add code
Mar 17, 2026
Viaarxiv icon

Semantic-VAE: Semantic-Alignment Latent Representation for Better Speech Synthesis

Add code
Sep 26, 2025
Viaarxiv icon

Dub-S2ST: Textless Speech-to-Speech Translation for Seamless Dubbing

Add code
May 27, 2025
Figure 1 for Dub-S2ST: Textless Speech-to-Speech Translation for Seamless Dubbing
Figure 2 for Dub-S2ST: Textless Speech-to-Speech Translation for Seamless Dubbing
Figure 3 for Dub-S2ST: Textless Speech-to-Speech Translation for Seamless Dubbing
Figure 4 for Dub-S2ST: Textless Speech-to-Speech Translation for Seamless Dubbing
Viaarxiv icon

Accelerating Diffusion-based Text-to-Speech Model Training with Dual Modality Alignment

Add code
May 26, 2025
Viaarxiv icon

AlignDiT: Multimodal Aligned Diffusion Transformer for Synchronized Speech Generation

Add code
Apr 29, 2025
Viaarxiv icon

VoiceCraft-Dub: Automated Video Dubbing with Neural Codec Language Models

Add code
Apr 03, 2025
Viaarxiv icon

MAVFlow: Preserving Paralinguistic Elements with Conditional Flow Matching for Zero-Shot AV2AV Multilingual Translation

Add code
Mar 14, 2025
Figure 1 for MAVFlow: Preserving Paralinguistic Elements with Conditional Flow Matching for Zero-Shot AV2AV Multilingual Translation
Figure 2 for MAVFlow: Preserving Paralinguistic Elements with Conditional Flow Matching for Zero-Shot AV2AV Multilingual Translation
Figure 3 for MAVFlow: Preserving Paralinguistic Elements with Conditional Flow Matching for Zero-Shot AV2AV Multilingual Translation
Figure 4 for MAVFlow: Preserving Paralinguistic Elements with Conditional Flow Matching for Zero-Shot AV2AV Multilingual Translation
Viaarxiv icon

Deep Understanding of Sign Language for Sign to Subtitle Alignment

Add code
Mar 05, 2025
Viaarxiv icon

V2SFlow: Video-to-Speech Generation with Speech Decomposition and Rectified Flow

Add code
Nov 29, 2024
Figure 1 for V2SFlow: Video-to-Speech Generation with Speech Decomposition and Rectified Flow
Figure 2 for V2SFlow: Video-to-Speech Generation with Speech Decomposition and Rectified Flow
Figure 3 for V2SFlow: Video-to-Speech Generation with Speech Decomposition and Rectified Flow
Figure 4 for V2SFlow: Video-to-Speech Generation with Speech Decomposition and Rectified Flow
Viaarxiv icon

ARLON: Boosting Diffusion Transformers with Autoregressive Models for Long Video Generation

Add code
Oct 27, 2024
Figure 1 for ARLON: Boosting Diffusion Transformers with Autoregressive Models for Long Video Generation
Figure 2 for ARLON: Boosting Diffusion Transformers with Autoregressive Models for Long Video Generation
Figure 3 for ARLON: Boosting Diffusion Transformers with Autoregressive Models for Long Video Generation
Figure 4 for ARLON: Boosting Diffusion Transformers with Autoregressive Models for Long Video Generation
Viaarxiv icon